Greg Clark claims on Razib Khan’s Unsupervised Learning that birth order — being an elder or younger sibling — doesn’t matter in explaining your adult status. Is that true? Let’s find out.
Birth order is interesting because within a group of siblings, it is independent of genetics. You and your sister are genetically different, but you are both random draws from mummy and daddy’s genome. Birth order makes no difference to this draw.1 So, if it matters, it shows that there are environmental influences at work.
Actually, among siblings, parental age — exactly how old your parents were when they had you — is also independent of genetics. So, this is a second test for environmental influences.2 There’s a big literature saying that earlier siblings do better. On the other hand, based on our own work, I’d expect that children of older parents do better. So these two effects might work in opposite directions.
In families who all have the same number of children, birth order is exactly evenly distributed. For example, each family with 3 children has one first child, one second child and one third child. So, if the sample is random, by definition an individual’s birth order won’t correlate with anything else about their family. That isn’t true of parental age, though: parents who have children at an older age will be different in many ways, including genetics. Differences in parental age within a family of siblings are independent of genetics, but differences between families are not.
Greg Clark’s sample of English families (available from his PNAS paper) has some siblings with the same parents. We can use these to test whether birth order and parental age matter.
The available outcome variables are:
Did the person go to university?
Logarithm of house value
Literacy from signing marriage certificate (only available for a few people)
Occupational status index 0-100
Index of multiple deprivation of the person’s address, 2019
Modern status (this just aggregates the other indices, so I haven’t used it)
Company directorship (I thought this would be rare, so didn’t use it)3
I’ll run within-family regressions for each family. I only use fathers who had one or two wives in the data (not simultaneously, one hopes).4 This keeps most of the data and removes a weird anomaly where person ID 254102 seems to have children by more than 1000 wives. A family always means a unique combination of father and mother.
Here’s the results. “byr” is birth year: within families, this is equivalent to parental age at birth.
Higher education, literacy and house value all show the pattern you’d expect. Children with more elder siblings do worse; children of older parents do better. Some effects are quite large. Every extra elder sibling reduces your chance of going to university by 1 percentage point (12% of the sample go to university) and of being literate by 2.5 percentage points (77% of the sample are literate). He or she also knocks about £3000 off the value of your house (at the mean value of exp(11.9) = £147,266). An extra year of parental age increases average literacy by 1 percentage point.
Occupational status and IMD 2019 are insignificant; effects on occupational status are precisely estimated and very small, since it is a 0-100 score.
We can also run pooled regressions, where we look at differences between families not just within them. We still control for family size. The risk is that father’s age at birth will now pick up differences between families that are genetic. Birth order shouldn’t, unless there is some special sample selection, for the reasons mentioned above. Here’s what that looks like:
Effects now all go in the expected direction: more elder siblings are bad for you, older parents are good for you. Most effect sizes are bigger, with the exception that literacy gets smaller and insignificant. In particular, the effect on occupational status now gets much larger: I wonder why it’s so different in the pooled specification!
What have we learned?
The effects of birth order and parental age go in opposite directions. But both of them matter. Within families, they are highly correlated (at about 0.85). But the dataset is large enough that I don’t think this causes problems with collinearity. The correlation is much lower across families (but within distinct family sizes: about 0.39).
You could argue that in practice, the two effects cancel out. That is probably true in the sense that overall, eldest children were not advantaged much. (If you don’t include parental age, only higher education is significantly associated with birth order; each extra elder sibling reduces your chance of going to university by 0.5 percentage points.) But this is because they are being buffeted by two opposing winds: the advantage of being eldest and the disadvantage of having younger parents. And parental age must have made a difference, irrespective of the number of siblings someone had.
The effects are intuitively “big” in terms of their impact on the subjects, but they explain very little of the variation in most outcomes — the within r-squared statistics are tiny. But I don’t think that is the point. Greg Clark’s argument in his PNAS paper is “a simple model of genetic transmission can explain the patterns in the data”. He makes a set of ancillary arguments showing that e.g. various kinds of cultural transmission theories don’t explain the data: for example, it doesn’t matter much if your father died while you were young. This is important, because an alternative explanation of the main pattern in his data could be cultural transmission. (He correctly points out that cultural transmission theories can be rather “protean”, whereas genetics are tightly specified by biology.)
Well, the problem is, genetic transmission can’t explain birth order effects. The only explanation is some form of environmental variation between siblings. The point of birth order effects is not that they are important forms of environmental variation: the point is that they are clean of genetics. If birth order can matter, then maybe environmental variation more generally matters — including the many kinds of variation that are likely to covary with genetics.
Code is available here.
[Update: I think I should emphasize that the English families data is a major public good, and I’m grateful to Greg Clark for producing it.]
If you liked this post, you might like my book Wyclif’s Dust: Western Cultures from the Printing Press to the Present. It’s available from Amazon, and you can read more about it here.
You can also subscribe to this newsletter (it’s free):
I recently started Lapwing, a more personal newsletter which is a story of my family.
This would be untrue if people chose their family size based on their children’s genetics! Like, if your first kid is really smart, you have another one, so then first children of two are disproportionately smart. I don’t think anyone has seen any evidence of this, though.
Older fathers do have more mutations in their sperm. But this is probably a very small effect, and anyway we see that children of older fathers have better outcomes, not worse ones.
If you use it, both birth order and father’s age are insignificant in the within-family regressions, and significant in the pooled ones.
Nothing much changes if I only use fathers with just one wife.
See the footnote.
‘Differences in parental age within a family of siblings are independent of genetics’
Wouldn’t increased parental age have an effect even within families - eg increased risk of chromosomal abnormalities and de novo mutations? Perhaps this isn’t the kind of genetics you have in mind.