The US Health and Retirement Study which I’ve been using has polygenic scores for schizophrenia. I pointed out here that there are large ethnic differences in rates of schizophrenia diagnosis. The HRS polygenic scores are normalized to mean 0, variance 1 separately in blacks and whites, but the documentation provides statistics on the original scores:
These are very different distributions.1 The median European score is below the minimum African score, and the median African score is above the maximum European score. That is, more than half of African-American respondents had scores above the highest European out of 12,000 people. We can back out the standard deviation from the standard error of the mean as 0.77 * √12090 = 84.7 for Europeans and about the same for Africans; the sample means are about five European standard deviations apart.
People have noticed this! As one article title drily puts it, “Polygenic risk score for schizophrenia is more strongly associated with ancestry than with schizophrenia”. The article graphs the pattern across different ancestries:
Geneticists, and hopefully my regular readers, will recognize that this picture is more of a question than an answer. Polygenic scores (PGS) are created by adding up the prediction from many genetic variants on the outcome of interest. The variant predictions are usually calculated within a single ethnicity. Otherwise, any variant that differs between, say, black and white people would predict any outcome that also differs between them — whether or not the variant had any causal impact. Even so, there are ways that spurious correlation can get into the scores. Suppose people who live in cities are more likely to be diagnosed with schizophrenia, and suppose white people who live in cities have more of certain genetic variants, just because different people moved to cities in the past and some of their descendants still live there. Then those variants will predict schizophrenia and go into the PGS. Now, suppose black people who live in cities also have the same variants, and suppose more blacks than whites live in cities. Then those variants will show a difference between the groups, and will also predict schizophrenia within each group, even though they aren’t actually having any causal effect and are just picking up an environmental difference.
There are other caveats too. What if we’re only picking up variants that matter in the white population? What if there are variants that matter in the black population, and blacks have fewer of those variants associated with schizophrenia than whites, but they aren’t captured on the DNA array chip?2 What if some variants are really causal, but only because they expose the carrier to environments that lead to schizophrenia? (Think of variants that affect skin colour and expose you to more discrimination.) If so, then what matters is probably changing the relevant environment.
Despite all this — or even because of it — if there is a large difference in a genetic predictor for schizophrenia between ethnic groups, and there is also a large difference in schizophrenia itself, then that is an important avenue to investigate.
How are psychiatric researchers handling these differences? Hmm, they’re kind of on it.
For instance, in the US, black schizophrenia patients are less likely to get drug treatment than whites. One reason is that the main effective drug, clozapine, lowers your white blood cell count. And black people are particularly likely to have to stop treatment because of this: the resulting condition is called neutropenia (I have had to swiftly learn a lot of technical terms for this piece). From a recent article:
We sought to identify risk alleles in the first genome-wide association study of neutrophil levels during clozapine treatment, in 552 individuals with treatment-resistant schizophrenia and robustly inferred African genetic ancestry. Two genome-wide significant loci were associated with low neutrophil counts during clozapine treatment…. Individuals homozygous for the C allele at rs2814778 were significantly more likely to develop neutropenia and have to stop clozapine treatment (OR=20.4, p=3.44x10^-7). This genotype, also termed ‘Duffy-null’, has previously been shown to be associated with lower neutrophil levels in those of African ancestry….
This result got mentioned in a big Nature review article on the advantages of more diverse samples in genomics:
The first GWAS of neutrophil counts during clozapine treatment (for schizophrenia) conducted in individuals of African ancestry identified a key role for the well-known African ancestry-specific Duffy allele in risk of neutropenia upon treatment, thereby improving our understanding of the differential rates of discontinuation of clozapine by ethnicity.
That paper goes on:
Although the social and economic factors are the most significant contributors to health disparities, genomic research has the potential to help unravel disparities in health outcomes that are inappropriately attributed to race. The discovery of kidney disease risk variants in the APOL1 gene that are found predominantly in individuals with African ancestry demonstrated this potential….[many more examples]… With continuing work on understanding how genetic factors contribute to health disparities, inadequate characterizations based on race can be replaced by screening for relevant markers, with improved targeting and the potential for novel biological insights.
The phrasing is a little disingenuous here. On the one hand, yeah, absolutely, if you can directly test for the Duffy-null allele to figure out whether to give someone clozapine, then that’s better than just using their ethnicity! On the other hand, if your patient is having a psychotic episode and a genetic test will take three days, then the patient’s race might be a helpful rule of thumb, and the result above explains why. Calling these disparities “inappropriately attributed to race” is a bit like saying “we mistakenly thought Wise Owl lived in the Hundred Acre Wood, but now we know better: Wise Owl lives in the Old Oak Tree in the Hundred Acre Wood!” A reasonable definition of the old-fashioned concept of race would be “biological differences associated with ethnicity”. Finding a biological difference associated with ethnicity does not contradict that.
But the important thing is that the medical research is getting done, and improving outcomes for patients from different ethnic groups.3 A certain amount of awkward circumlocution is a price worth paying.
Here is a social science kicker. The HRS polygenic scores for educational attainment show a similarly big difference in distributions between black and white respondents.
The median European-ancestry score is only a little below the maximum African-ancestry score. The means are about two European standard deviations apart, or about 1.5 standard deviations of the whole sample. These differences are not as big as for schizophrenia, but they are very big — bigger than I expected.4 You can be absolutely sure that, as for schizophrenia, the educational attainment PGS will be “more strongly associated with ancestry than it is with educational attainment”.
All the caveats for the schizophrenia score apply in spades here! About half of the “effect” of the polygenic score for educational attainment is non-causal, i.e. fake. We know that because among siblings, where differences in the score are random, the score’s effect on education is halved. So, if you take two random white people, and predict their education from their genes with this score, about half of the predicted difference will not really be caused by their genes.
What about if you take a random black person and a random white person? How much of the difference will be truly caused by genetics? We don’t know, because we don’t know whether the difference above is in the real or the fake part of the score. If
and
Then it could be that the REAL part is the same between the ethnic groups, and only the FAKE part is different; or only the REAL part is different; or any mix of the two. Intuitively, you might expect more than 50% to be the fake part, because we already know that black and white people live in very different environments in the US, so there are a lot of environmental differences that the FAKE part could be picking up. But I don’t know!
Nerdy aside: in particular, even what I wrote here, that we could estimate the effects of the difference using within-family regressions on PGS, is wrong. Here’s why: suppose all the black-white differences are in the FAKE part of PGS. If we take a white sample, and use an estimate of the causal effect of PGS from within that sample, we can answer the question “what would happen if white people had the same distribution of scores as black people?” But that isn’t the right question. We want to know “what if white people had the same distribution of genetic variants as black people”. By counterfactually changing the scores to match the black sample, you are estimating the effect of giving white people more of both REAL and of FAKE. But the black sample only has more of FAKE. You really need causal polygenic scores — direct estimates of REAL — which are created using within-family regressions. Well, people are making those, and like the original PGS, they will sooner or later become available to the wider scientific community.
It’s worth thinking about what can be gained specifically from comparing different ethnic groups, rather than simply doing research on one or other group in isolation.
One answer is to help us understand how society works. Differences between groups are objects of intense social and political debate. In particular, when differences in outcomes are not purely due to the social environment, then it is important to understand that. If differences in schizophrenia diagnosis or outcomes are not, or not mainly, due to racism in the medical system, or society more generally, then we should not accuse doctors or the medical system of racism. Making that argument will often be controversial — which is why it is important.
But we should aim for more than that. Differences in many outcomes matter. Often, we should want to minimize them, even if they are partly rooted in genetics and biology. (And in any case, as I keep saying, genetics are social outcomes themselves:)
We would like the study of intergroup differences not just to inform the political debate, but actually lead to new ways of helping disadvantaged groups. We owe that to the people who gave us their DNA samples. Sometimes it is literally what they were promised.
Bluntly, one reason people are afraid of genetic research is that it seems to come with the following implication: “you thought these big inequalities were caused by social injustice. So sorry, they’re natural! Nothing to be done!” Cue crocodile tears. We should aim higher than that. (In particular, notice how this argument differs from the previous point. Saying “don’t blame racism” is not the same as “don’t do anything”.)
I can think of two ways that studying intergroup differences specifically might help us. First, it might give us clues to causal pathways from DNA to social or medical outcomes. Second, we might learn how genes interact with different environments.
Black and white Americans have different polygenic scores because they have more or less of a certain set of genetic variants. I doubt that those differences are evenly distributed across the variants — that, say, black people are 10% more probable to have every one of the alleles which predict lower educational attainment. It is more likely that the differences are concentrated in certain places, or certain kinds of variant. (For example, as I suggested, they could be concentrated in the FAKE variants that predict education but don’t cause it.) Looking at where they are concentrated would give us a set of variants which are potentially linked to intergroup inequality.
The psychiatric research above is maybe an example of this. Knowing the interethnic difference in frequency for the Duffy-null allele helped generate the hypothesis that this affected differences in response to clozapine. This in turn is an “intermediate phenotype” which leads to different outcomes for schizophrenia.
Behaviour geneticists and social scientists could search for similar intermediate phenotypes which are on the causal pathway between genetics and ultimate outcomes like staying in school or graduating from college. In our case, they might not be deep biological phenotypes like white blood cell counts, more like psychological outcomes (“IQ at age 11”) or social ones (“having books in the home” — a candidate correlate of a FAKE variant, maybe). Which makes me think that it is time to get reacquainted with our James Heckman and Roland Fryer/Steven Levitt.
A related point is that there can be gene-environment interactions. Obvious example: all the genes in the world won’t get you to college if you are a girl in today’s Afghanistan. If we understand how genes take effect in different environments, then we might nudge the environment to equalize outcomes. (Obviously, there are smart and silly ways to do this. You can give everyone higher “educational attainment” just by forcing them to stay in school longer; you can reduce test score inequality if you just don’t do the tests. If only I was kidding. We want interventions that improve real outcomes, and level up not down.) Research on this might look at interethnic differences in environments, to see how they interact with genetics.
Doing all of this would require a big research effort. The persistence of racial inequality, despite attempts to remove “obvious” kinds of racial discrimination from the 1950s onwards, generated a large amount of social science aimed at estimating how environmental differences, from hidden discrimination to socioeconomic deprivation, affected those outcomes. Some strands of that research are more robust than others, but overall a lot of important work has been done. Now we will need an equal amount of work to understand the interplay between genes and environments — in intergroup context. Many researchers from the environmental traditions are likely to welcome that idea the way a cat welcomes veterinary pills. But in the end it will need to be done.
My Airbnb flat has a huge widescreen TV so I am entertaining myself by watching The Wire. (An advantage of being behind the times: other people’s passé pleasures are fresh for you.) Season Four, the one about schools. I don’t care much about the quality of its sociopolitical analysis, that’s not what TV shows are for. One of its strengths is the sheer depth of acting talent in the cast. Another is just that it has a very big heart.
I believe in scientific freedom, I believe what I’ve written here, I have a tee shirt that says EPPUR SI MUOVE and another that says SAPERE AUDE.5 In the end, I still feel sad about all of this. I used to think that genes might make, maybe not zero contribution to racial inequality, but a small or negligible one. Despite all the caveats, after seeing the stats above, I now think that is much less likely.
If you enjoyed this, you might like my book Wyclif’s Dust: Western Cultures from the Printing Press to the Present. It’s available from Amazon, and you can read more about it here.
You can also subscribe to this newsletter (it’s free):
To estimate the true population difference, you should adjust for the HRS survey structure and weighting. You’d probably also want to regress out some principal components of genetic data. But differences this big won’t be killed by that kind of tweaking.
As Abdel Abdellaoui and coauthors put it:
Negative selection pressures can give rise to population-specific genetic architectures and causal variants for the same traits, making it difficult to detect trait mean differences by comparing polygenic scores based on GWASs from a single population. Theoretical work indicates that a polygenic trait constrained by stabilizing selection to a certain optimum phenotypic value in two populations can, counter-intuitively, increase the genetic differentiation of trait-influencing loci: genetic variants that accidentally increase in frequency as a result of drift in one population would lead to a compensatory decrease in frequency of other loci in this population.
Got that?
I haven’t seen anything specifically using the schizophrenia polygenic score in this context. But I’m not an expert and only took a quick look; there may be work out there or in progress.
With distributions this far apart, about 94% of the time a randomly drawn person from the higher-mean group will score higher than a randomly drawn person from the lower group.
“Yes it’s moving”, Galileo. “Dare to know”, Kant.
Neanderthal-Derived Genetic Variation in Living Humans Relates to Schizophrenia Diagnosis, to Psychotic Symptom Severity, and to Dopamine Synthesis
https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8454493/
Hi David, I sent you a message on Twitter!