Here is a good, very comprehensive introduction by Scott Alexander to the debate about missing heritability. This post is much shorter and is going to just focus on one aspect — the value of polygenic scores.
I think you cannot really talk about the magnitude/value of an effect size without first explaining what you intend to use it for. There are rare variants explaining a tiny amount of population variance but providing very important information for drug targets (which can then explain a large amount of population variance when intervened on). On the other hand, a smoking polygenic score might explain a decent amount of variance in lung cancer risk but will be useless in a model that already includes smoking itself. What is the intended use of the PGS? A lot of the existing applications in behavior genetics want to have a "clean" causal instrument, and the population PGS is clearly not that (and, I would argue, confounded in ways can be extremely misleading).
The more general point I'd make is that if a field describes a trait with 80% heritability (like height) as "largely genetic" then it should describe a trait with 10-20% heritability (like IQ is shaping up to be and Edu already is) as "largely non-genetic". That's just being consistent.
Yup, I agree. I didn't want to go off into that here, but it is worth another post.
About "largely genetic" vs "largely non-genetic". Broadly, sure, but be careful. Suppose we halve the number of questions in an IQ test. The IQ test now has more noise in it, and heritability drops accordingly. Has it become "more non-genetic"? Only in the sense of containing more randomness! The underlying thing we are trying to measure hasn't changed at all. Again, this is a problem with the heritability statistic. Non-heritability includes everything from systematic environmental variation, which the social scientist can in principle discover, to subtle obscure effects which might be hard to capture systematically, to pure noise, which is just not ever gonna be accounted for by anything.
So, it might be simultaneously true that a trait is only 10% heritable, and that _all_ its systematic variation comes from genetics; there is simply nothing else there to discover, and nothing else that we can target if we want to change it. Calling that "largely non-genetic" would be misleading, I think.
Specifically about EA, if you accept the numbers for the effect of polygenic scores in sibling regressions, then the relevant comparison isn't "everything else including noise" - it is other systematic environmental effects. ChatGPT gives some useful examples: https://chatgpt.com/share/68651971-9e04-8010-bed9-f664b4ebfc7f
What you see there is that there are some environmental effects with the same size as moving from the 5th to the 95th percentile of the EA PGS. But they are intensive: things like high-quality preschool (if that replicates) or cash rewards for completion (but you might worry about perverse incentives). Other effects, like small class sizes, or better teachers, are much smaller, unless you implement them over many years.
The point isn't that we should all be investing in embryo selection. Just that a fair statement is "genetic effects on education are big and substantive, as big as the biggest environmental effects we know about".
Urgh, I just lost an entire comment. I'm on a train... Trying again.
I don't think an appeal to rare variants is going to bail PGS out of their small effect size hole. My standard example of a genetic predictor with a large unstandardized regression coefficient and a small R^2 is Down's syndrome. Down's has a huge causal B coefficient-- like 30 points-- but would account for practically no variance in a representative population. Why? Because it is rare, which is another way of saying it has low variance. In a different setting in which it was more common-- say in an inpatient facility that had half individuals with Down's-- it might account for much more variance.
This kind of thinking doesn't apply to PGS for two reasons:
1) PGS consist of common variants by definition. Is there a situation in which we would expect individuals to vary much more than they do? I guess you could say, a population consisting only of people at <5% and >95% of the population, but how do you create that? People like that only occur (scratch scratch scratch) 10% of the time.
2) Arguments like this always seem to imply that there is something special about low-R2, PGS, but there is not. In fact there is only one kind of linear relationship with an R2 of .01. It isn't as though there are some relations with low R2 where the 1% and 99% don't differ, but for PGS they do. In fact, the unstandardized coefficient B and R2 are related by a simple equation:
R2 = B*(sx/sy)
where sx and sy are the standard deviations. This is why R2 increases as the variance of the predictor increases.
The argument about the strength of the PGS effects relies on looking at the effects of a 1SD or more change of the PGS. I can't see how this choice was justified, and I would think that interventions that are so powerful are rather rare. I assume that one could make a similar argument, "a powerful intervention on variable x has a large effect on outcome y" for many variables.
Actual interventions changing the PGS are practically non-existent, except among a few hard-core tech people. But 1 sd is (by definition) not a large change compared to the population! It’s like moving from the average to being better than 66%.
Agreed that there are no interventions, and given the poligenicity of complex traits it seems unlikely that this will be possible.
I do however think that an intervention that changes a causal factor with 1SD is very powerful given that 95% of the population are within +/- 2SD. I'd be very surprised if there are many studies that could implement such interventions. To put this in context, let's assume we are interested in the effect of IQ on educational attainment. I don't think that many people would say that an intervention that increases IQ from 85 to 100 (SD=15) is by definition small.
I don’t think it’s worth speculating about whether such interventions are “big” or “small” - right now they’re mostly impossible; if they ever become possible, then big ones will be just as possible. The point is that the population variation already encompasses this much. So, the existing genetic differences between people can have as large effects as many very powerful policy interventions. much
https://open.substack.com/pub/ericturkheimer/p/news-flash-effect-size-doesnt-matter?r=a61h9&utm_medium=ios
I think you cannot really talk about the magnitude/value of an effect size without first explaining what you intend to use it for. There are rare variants explaining a tiny amount of population variance but providing very important information for drug targets (which can then explain a large amount of population variance when intervened on). On the other hand, a smoking polygenic score might explain a decent amount of variance in lung cancer risk but will be useless in a model that already includes smoking itself. What is the intended use of the PGS? A lot of the existing applications in behavior genetics want to have a "clean" causal instrument, and the population PGS is clearly not that (and, I would argue, confounded in ways can be extremely misleading).
The more general point I'd make is that if a field describes a trait with 80% heritability (like height) as "largely genetic" then it should describe a trait with 10-20% heritability (like IQ is shaping up to be and Edu already is) as "largely non-genetic". That's just being consistent.
Yup, I agree. I didn't want to go off into that here, but it is worth another post.
About "largely genetic" vs "largely non-genetic". Broadly, sure, but be careful. Suppose we halve the number of questions in an IQ test. The IQ test now has more noise in it, and heritability drops accordingly. Has it become "more non-genetic"? Only in the sense of containing more randomness! The underlying thing we are trying to measure hasn't changed at all. Again, this is a problem with the heritability statistic. Non-heritability includes everything from systematic environmental variation, which the social scientist can in principle discover, to subtle obscure effects which might be hard to capture systematically, to pure noise, which is just not ever gonna be accounted for by anything.
So, it might be simultaneously true that a trait is only 10% heritable, and that _all_ its systematic variation comes from genetics; there is simply nothing else there to discover, and nothing else that we can target if we want to change it. Calling that "largely non-genetic" would be misleading, I think.
Specifically about EA, if you accept the numbers for the effect of polygenic scores in sibling regressions, then the relevant comparison isn't "everything else including noise" - it is other systematic environmental effects. ChatGPT gives some useful examples: https://chatgpt.com/share/68651971-9e04-8010-bed9-f664b4ebfc7f
What you see there is that there are some environmental effects with the same size as moving from the 5th to the 95th percentile of the EA PGS. But they are intensive: things like high-quality preschool (if that replicates) or cash rewards for completion (but you might worry about perverse incentives). Other effects, like small class sizes, or better teachers, are much smaller, unless you implement them over many years.
The point isn't that we should all be investing in embryo selection. Just that a fair statement is "genetic effects on education are big and substantive, as big as the biggest environmental effects we know about".
Urgh, I just lost an entire comment. I'm on a train... Trying again.
I don't think an appeal to rare variants is going to bail PGS out of their small effect size hole. My standard example of a genetic predictor with a large unstandardized regression coefficient and a small R^2 is Down's syndrome. Down's has a huge causal B coefficient-- like 30 points-- but would account for practically no variance in a representative population. Why? Because it is rare, which is another way of saying it has low variance. In a different setting in which it was more common-- say in an inpatient facility that had half individuals with Down's-- it might account for much more variance.
This kind of thinking doesn't apply to PGS for two reasons:
1) PGS consist of common variants by definition. Is there a situation in which we would expect individuals to vary much more than they do? I guess you could say, a population consisting only of people at <5% and >95% of the population, but how do you create that? People like that only occur (scratch scratch scratch) 10% of the time.
2) Arguments like this always seem to imply that there is something special about low-R2, PGS, but there is not. In fact there is only one kind of linear relationship with an R2 of .01. It isn't as though there are some relations with low R2 where the 1% and 99% don't differ, but for PGS they do. In fact, the unstandardized coefficient B and R2 are related by a simple equation:
R2 = B*(sx/sy)
where sx and sy are the standard deviations. This is why R2 increases as the variance of the predictor increases.
Also see link to my longer response, below.
The argument about the strength of the PGS effects relies on looking at the effects of a 1SD or more change of the PGS. I can't see how this choice was justified, and I would think that interventions that are so powerful are rather rare. I assume that one could make a similar argument, "a powerful intervention on variable x has a large effect on outcome y" for many variables.
Actual interventions changing the PGS are practically non-existent, except among a few hard-core tech people. But 1 sd is (by definition) not a large change compared to the population! It’s like moving from the average to being better than 66%.
Agreed that there are no interventions, and given the poligenicity of complex traits it seems unlikely that this will be possible.
I do however think that an intervention that changes a causal factor with 1SD is very powerful given that 95% of the population are within +/- 2SD. I'd be very surprised if there are many studies that could implement such interventions. To put this in context, let's assume we are interested in the effect of IQ on educational attainment. I don't think that many people would say that an intervention that increases IQ from 85 to 100 (SD=15) is by definition small.
I don’t think it’s worth speculating about whether such interventions are “big” or “small” - right now they’re mostly impossible; if they ever become possible, then big ones will be just as possible. The point is that the population variation already encompasses this much. So, the existing genetic differences between people can have as large effects as many very powerful policy interventions. much