Physical Address
304 North Cardinal St.
Dorchester Center, MA 02124
Physical Address
304 North Cardinal St.
Dorchester Center, MA 02124
Malte Elson is blunt when it comes to science’s ability to self-correct. “The way we currently treat errors doesn’t work,” he says.
To prove his point, Elson, a psychologist at the University of Bern, highlights a well-known 2010 paper1 by economists Carmen Reinhart and Kenneth Rogoff at Harvard University in Cambridge, Massachusetts. “This paper became highly influential in financial policies in Europe,” says Elson, where it “promoted austerity measures to reduce national debt”. Three years later, Thomas Herndon, an economics PhD student at the University of Massachusetts Amherst at the time, tried to replicate the paper’s results for a class assignment and discovered an error in a crucial spreadsheet used in the paper. The authors had selected only 15 of the 20 countries they meant to include for a key calculation2. When this and two other errors were considered, the study’s conclusions were less strong than they initially appeared, Elson says.
Pay researchers to spot errors in published papers
Reinhart and Rogoff cooperated by providing their data and admitting to the errors, but they have maintained that their overall conclusion is sound. But, these errors might never have been discovered had Herndon not tried to reproduce the results.
This haphazard system of error detection makes no sense, Elson says. “We cannot seriously rely on coincidental discovery of errors.” Currently, looking for errors in published papers is neither systematic nor rewarded. Elson and his colleagues launched the Estimating the Reliability and Robustness of Research (ERROR) project in February to change that.
The ERROR project pays reviewers to check highly cited psychology and psychology-related papers for errors in code, statistical analyses and reference citations. The programme posted its first review in May — the first of 100 planned over 4 years. This month, the ERROR team aim to have the first 20 papers assigned to reviewers.
Led by Elson, Ian Hussey, a meta-scientist also at the University of Bern, and Ruben Arslan, a psychologist at Leipzig University in Germany, ERROR focuses on papers with a continuous stream of citations that were published in “important and respected journals in subdisciplines of psychology” since January 2015, Elson says. The ERROR team prioritizes highly cited papers to maximize the impact of its efforts, and contacts study authors asking for their permission to review their work. “For ERROR to be successful, it’s important that everybody is on board,” Elson explains — but the team also requires access to each paper’s underlying data and code, which only authors can provide.
‘Spell-checker for statistics’ reduces errors in the psychology literature
With funding from the Humans in Digital Transformation programme, a fund to drive a digitalization strategy at the University of Bern, which has offered the project 4 years of support and 250,000 Swiss francs (US$289,000), reviewers are paid up to 1,000 francs for each paper they check. They get a bonus for any errors they find, with bigger bonuses for bigger errors — for example, those that result in a major correction notice or a retraction — up to a maximum of 2,500 francs. This bonus is modelled on ‘bug bounty’ programmes that technology companies, such as Microsoft and Google, offer to hackers who can find and report vulnerabilities in their products.
Errors can include mistakes in code, discrepancies between the code and the wording in the manuscript, statistical analyses that do not support conclusions or are misinterpreted and inaccurate citations.
Authors are compensated as well: 250 francs for answering reviewer questions and making data available, with an extra 250 francs if the reviewer finds only minor or no errors.
ERROR posted its first review in May3 for a 2018 paper4 in the journal Psychophysiology that was authored by cognitive neuroscientist Jan Wessel at the University of Iowa in Iowa City. The process was exemplary, Hussey says, including the open-mindedness of Wessel and of cognitive neuroscientist Russ Poldrack at Stanford University in California, who performed the review and found only minor errors. Wessel even wrote a simulation study that found a 96% chance that there is at least one remaining error in his data set that even ERROR’s review has not caught. “This was a very cool mentality — exactly what we’re hoping to foster,” says Hussey.
Smart software spots statistical errors in psychology papers
Hussey expects to post three more reviews in September. To hit 100 papers in 4 years, the team will need to publish about one review every 2 weeks.
Although initially focused on psychology, the ERROR project is “actively working towards” expanding to other disciplines, says Hussey. The team has applied for funding from the Swiss National Science Foundation to expand into artificial-intelligence research and hopes to take on medical research as well. “More generally, we hope to demonstrate a scalable and transferable model for how to do this, so that other researchers can do it in their own field,” says Hussey. The team is also exploring the possibility of auditing manuscript preprints as well as published articles, Elson says.
Still, the project faces significant challenges. Few authors respond to ERROR’s e-mails asking for permission to review their paper, says Elson. So far, only 17 authors have agreed to have their study reviewed from 134 selected papers. Sometimes, the underlying data no longer exist or cannot be found. And sometimes, authors reply saying that third parties cannot have access to the data for legal reasons. Although there are technical solutions for this, Elson says he doesn’t press.
A further challenge is finding reviewers who have both the required technical expertise and no conflicts of interest with the study authors. Reviewers, Hussey says, might need more technical knowledge than the authors themselves, “because you have to know about the probabilities of different kinds of errors happening”. Often, such reviewers are PhD students or postdoctoral researchers, who might be put in a difficult career position if they cast doubt on a publication that is authored by more-established researchers. “We are acutely aware of the power dynamics involved,” Hussey says. “We’re trying our best to match that balance of power in who’s doing the critique and who’s being critiqued.”
NatureTech hub
In 2023, in an attempt to help grow the pool of potential reviewers, Hussey began teaching a master’s-level course on error detection at the University of Bern’s psychology department. The Institute of Psychiatry, Psychology and Neuroscience at King’s College London ran a similar course at a summer school in July.
Now, the ERROR team hopes to convince those who fund research to pay for error reviews of the research that they support. Funders will benefit from error detection because they pay doubly for errors, Elson says: once by wasting money on research that turns out to be incorrect, and again because they missed the opportunity to fund a different project. Since May (when Elson published a World View article5 in Nature on the project), Elson has spoken with both the German research foundation DFG, and Volkswagen Foundation, a private funder.
Compared with the current ad hoc approach, “meaningful discoveries per dollar spent would actually be higher with some degree of systematic error scrutiny”, Hussey says. And a serious error-detection system needs resources, says Elson. “We cannot expect it to work for free.”