The vast deserts of bad science
My name is Ozymandias, PI of PIs. Look on my p values, ye mighty, and despair
Correlation is not causality. You know this already. My readers are a scientifically literate bunch (please buy a paid subscription 💓 ). I just don’t think people realise (a) how much of science just goes on as if correlation was casuality and (b) how serious the effects are.
There are huge deserts of bad science out there. From the middle of them it’s wasteland as far as the eye can see: nobody is doing any different. A new PhD student, parachuted into this wasteland, will just assume that what they see is science.
This has just as bad results as p hacking, HARKing (hypothesizing after the results are known), and other more famous bad practices. Its results are probably worse than silly research in the humanities. Research that is obviously silly is easy to spot and laugh at. Whereas this stuff looks scientific! It is not as bad as scientific fraud, but it is far more common, so its total effects are probably worse.
Really it is a kind of scientific fraud. It’s just collective fraud. There’s no excuse for whole fields to pretend you can just run a regression, throw in a bunch of controls and hope to learn anything.
I don’t claim that correlations are never interesting! They can be and they always need some explanation. But they aren’t causality. Also, correlations with good enough controls can sometimes be plausible evidence for causality. But it’s rare, and in some settings, it just is never going to happen.
Someone needs to circle over this nonsense like a dragon, vomiting flame and bile until it shrivels and is consumed in fire. Allow me.
I’ll start with two examples.
Veterinary science and cruciate ligaments
Two years ago my dog hurt his back legs. Cruciate ligament injuries are very common among dogs; fixing them is a billion dollar business. There are various treatment options, including doing nothing. My vet suggested an operation which would cost four figures. Being a nerd, I started reading the academic literature to discover how effective it was.
Here’s what I found. In 2005, Aragon and Budsberg’s “Applications of evidence‐based medicine: cranial cruciate ligament injury repair in the dog” came out. (“Evidence-based medicine” is a terrifying phrase, no? What other kind of medicine is there, and who are they using it on?) The article said that we had essentially no solid data on whether all these treatments worked:
the current available evidence suggests that there is not a single surgical procedure that has enough data to recommend that it can consistently return dogs to normal function after CCL injury.
That’s pretty harsh and unambiguous. So, the scientific process went to work.
Nine years later, another review article came out. From the abstract:
Two studies provided level 1, 6 provided level 2, 6 provided level 3, and 20 provided level 4 evidence relative to the study question. The most common surgical procedures included tibial plateau leveling osteotomy (TPLO, n = 14), lateral extracapsular suture (n = 13), tibial tuberosity advancement (n = 6). The strength of the evaluated evidence most strongly supports the ability of the TPLO in the ability to return dogs to normal function.
This sounds much better. “Levels” of evidence here are a measure used in medicine. Level 1 is the highest (randomized controlled trial or similar), level 4 is the lowest. I began to skim the papers cited, so as to see how these levels of paper differed from one another.
I gave up after levels 1 and 2. If they were this bad, then there was no point wasting time on the others.
There was essentially just one true randomized trial of any of these treatments. It was a hot mess. There were two treatment arms, N = 15 and N = 23, plus a control group of N = 80. Even for comparing the N = 23 treatment with the control, that’s vastly underpowered. (Nerds can play with statistical power tests here: to detect a half-a-standard-deviation difference, you’d have only 55% power.) To detect differences between the treatments, forget it.
Oh, but I forgot: the control group wasn’t randomized. It was just a bunch of dogs “recruited from the community”. They didn’t even have CCL problems! I am not joking.
So you are stuck with comparing two groups with an N of 15 and 23. When it starts this bad, the details usually get worse. Sure enough: dogs were allocated to treatment group based on owner preference. I mean, if the owners know what to do, why even run the experiment? After 6 months there were only 10 and 15 dogs respectively in each treatment arm. Of the rest, some were “lost to follow up” (at random? Who knows?) Others had further CCL injuries, which happens quite a lot after the first injury, and were then excluded from subsequent data. (If you’re missing the point here, imagine testing the safety of a new foot medicine, and excluding some subjects from the results because their feet had exploded.) What else? Multiple statistical tests but no correction for multiple testing…. Look. Junk is junk.
Criticizing this particular study is not the point. I can imagine just how tough it is to run this kind of work, given dogs and their owners. And remember, this was not the worst study run. It was the only randomized trial (except not even randomized).
The bottom line is that veterinary medicine is a fifty billion dollar industry. Yet, a decade after a scathing review of the evidence-free methods vets use to treat one of the commonest dog ailments, they hadn’t got it together to run a single decent large N randomized controlled trial.
I often joke that the one time I feel proud to be a social scientist is when I read medical research. But veterinary research makes medical research look good. It is a disaster of underpowered small N studies, studies without a control group, non-randomized treatments, correlational data, practitioners doing “research” to promote their own treatments…. Essentially, it’s astrology. But much better paid.
What’s wrong here? Why isn’t real science being done? I could give superficial guesses. Veterinary practice is a very fragmented industry, making it hard to run large-scale clinical trials, maybe. But at a deep level I think it is simpler. Nobody has an incentive to do real science, because vets are marking their own homework. Sure the treatment works! It seems to work!
As for Adler, I was much impressed by a personal experience. Once, in 1919, I reported to him a case which to me did not seem particularly Adlerian, but which he found no difficulty in analysing in terms of his theory of inferiority feelings, although he had not even seen the child. Slightly shocked, I asked him how he could be so sure. ‘Because of my thousandfold experience,’ he replied; whereupon I could not help saying: ‘And with this new case, I suppose, your experience has become thousand-and-one-fold.’
Popper, Conjectures and Refutations
We paid for the operation, in the end. Our dog recovered OK, though he still looks a bit awkward when he runs because the vet reset his legs to be straighter. Did the operation help? How should I know?
TV and screen time effects
I have gone paid. Now is a great time to subscribe and support my writing. It costs just £3.50/month, and yearly subscribers get a great big 40% discount, plus a free copy of my book:
The article continues below for paid subscribers.