Second Analysis of Ferguson’s Model

9 May 2020

by Sue Denim (not the author’s real name)

I’d like to provide a followup to my first analysis. Firstly because new information has come to light, and secondly to address a few points of disagreement I noticed in a minority of responses.

The hidden history. Someone realised they could unexpectedly recover parts of the deleted history from GitHub, meaning we now have an audit log of changes dating back to April 1st. This is still not exactly the original code Ferguson ran, but it’s significantly closer.

Sadly it shows that Imperial have been making some false statements.

I don’t quite know what to make of this. Originally I thought these claims were a result of the academics not understanding the tools they’re working with, but the Microsoft employees helping them are actually employees of a recently acquired company: GitHub. GitHub is the service they’re using to distribute the source code and files. To defend this I’d have to argue that GitHub employees don’t understand how to use GitHub, which is implausible.

I don’t think anyone involved here has any ill intent, but it seems via a chain of innocent yet compounding errors – likely trying to avoid exactly the kind of peer review they’re now getting – they have ended up making false claims in public about their work.

Effect of the bug fixes. I was curious what effect the hidden bug fixes had on the model output, especially after seeing the change to the pseudo-random number generator constants (which means the prior RNG didn’t work). I ran the latest code in single threaded mode for the baseline scenario a couple of times, to establish that it was producing the same results (on my machine only), which it did. Then I ran the version from the initial import against the latest data, to control for data changes.

The resulting output tables were radically different to the extent that they appear incomparable, e.g. the older code outputs data for negative days and a different set of columns. Comparing by row count for day 128 (7th May) gave 57,145,154 infected-but-recovered people for the initial code but only 42,436,996 for the latest code, a difference of about 34%.

I wondered if the format of the data files had changed without the program being able to detect that, so then I reran the initial import code with the initial data. This yielded 49,445,121 recoveries – yet another completely different number.

It’s clear that the changes made over the past month and a half have radically altered the predictions of the model. It will probably never be possible to replicate the numbers in Report 9.

Political attention. I was glad to see the analysis was read by members of Parliament. In particular, via David Davis MP the work was seen by Steve Baker – one of the few British MPs who has been a working software engineer. Baker’s assessment was similar to that of most programmers: “David Davis is right. As a software engineer, I am appalled. Read this now”. Hopefully at some point the right questions will be asked in Parliament. They should focus on reforming how code is used in academia in general, as the issue is structural incentives rather than a single team. The next paragraph will demonstrate that.

Do the bugs matter? Some people don’t seem to understand why these bugs are important (e.g. this computational biology student, or this cosmology lecturer at Queen Mary). A few people have claimed I don’t understand models, as if Google has no experience with them.

Imagine you want to explore the effects of some policy, like compulsory mask wearing. You change the code and rerun the model with the same seed as before. The number of projected deaths goes up rather than down. Is that because:

  • The simulation is telling you something important?
  • You made a coding error?
  • The operating system decided to check for updates at some critical moment, changing the thread scheduling, the consequent ordering of floating point additions and thus changing the results?

You have absolutely no idea what happened. 

In a correctly written model this situation can’t occur. A change in the outputs means something real and can be investigated. It’s either intentional or a bug. Once you’re satisfied you can explain the changes, you can then run the simulation more times with new seeds to estimate some uncertainty intervals.

In an uncontrollable model like ICL’s you can’t get repeatable results and if the expected size of the change is less than the arbitrary variations, you can’t conclude anything from the model. And exactly because the variations are arbitrary, you don’t actually know how large they can get, which means there’s no way to conclude anything at all.

I ran the simulation three times with the code as of commit 030c350, with the default parameters, fixed seeds and configuration. A correct program would have yielded three identical outputs. For May 7th the max difference of the three runs was 46,266 deaths or around 1.5x the actual UK total so far. This level of variance may look “small” when compared to the enormous overall projections (which it seems are incorrect) but imagine trying to use these values for policymaking. The Nightingale hospitals added on the order of 10-15,000 places, so the uncontrolled differences due to bugs are larger than the NHS’s entire crash expansion programme. How can any government use this to test policy?

An average of wrong is wrong.  There appears to be a seriously concerning issue with how British universities are teaching programming to scientists. Some of them seem to think hardware-triggered variations don’t matter if you average the outputs (they apparently call this an “ensemble model”).

Averaging samples to eliminate random noise works only if the noise is actually random. The mishmash of iteratively accumulated floating point uncertainty, uninitialised reads, broken shuffles, broken random number generators and other issues in this model may yield unexpected output changes but they are not truly random deviations, so they can’t just be averaged out. Taking the average of a lot of faulty measurements doesn’t give a correct measurement. And though it would be convenient for the computer industry if it were true, you can’t fix data corruption by averaging.

I’d recommend all scientists writing code in C/C++ read this training material from Intel. It explains how code that works with fractional numbers (floating point) can look deterministic yet end up giving non-reproducible results. It also explains how to fix it.

Processes not people. This is important: the problem here is not really the individuals working on the model. The people in the Imperial team would quickly do a lot better if placed in the context of a well run software company. The problem is the lack of institutional controls and processes. All programmers have written buggy code they aren’t proud of: the difference between ICL and the software industry is the latter has processes to detect and prevent mistakes.

For standards to improve academics must lose the mentality that the rules don’t apply to them. In a formal petition to ICL to retract papers based on the model you can see comments “explaining” that scientists don’t need to unit test their code, that criticising them will just cause them to avoid peer review in future, and other entirely unacceptable positions. Eventually a modeller from the private sector gives them a reality check. In particular academics shouldn’t have to be convinced to open their code to scrutiny; it should be a mandatory part of grant funding.

The deeper question here is whether Imperial College administrators have any institutional awareness of how out of control this department has become, and whether they care. If not, why not? Does the title “Professor at Imperial” mean anything at all, or is the respect it currently garners just groupthink? 

Insurance. Someone who works in reinsurance posted an excellent comment in which they claim:

  • There are private sector epidemiological models that are more accurate than ICL’s.
  • Despite that they’re still too inaccurate, so they don’t use them.
  • We always use 2 different internal models plus for major decisions an external, independent view normally from a broker. It’s unbelievable that a decision of this magnitude was based off a single model

They conclude by saying “I really wonder why these major multinational model vendors who bring in hundreds of millions in license fees from the insurance industry alone were not consulted during the course of this pandemic.

A few people criticised the suggestion for epidemiology to be taken over by the insurance industry.  They had insults (“mad”, “insane”, “adding 1 and 1 to get 11,000” etc) but no arguments, so they lose that debate by default. Whilst it wouldn’t work in the UK where health insurance hardly matters, in most of the world insurers play a key part in evaluating relative health risks.

Subscribe
Notify of
guest
269 Comments
Oldest
Newest Most Voted
Inline Feedbacks
View all comments
Simon
Simon
26 days ago

Is there enough here to say that people who’ve lost a shed load of money as a result of this lock-down could potentially sue Imperial College for loss of earnings in a civil court case?

earthflattener
earthflattener
26 days ago
Reply to  Simon

No, they didn’t make any decision. Moreover, the imperial model coincides with the predictions of many other models. Most countries chose to lockdown on the basis of quite different models – but all seemingly pointed towards the same problem

forsyth
forsyth
26 days ago
Reply to  earthflattener

Are you sure that the problem it’s seemingly pointing towards isn’t the use of unvalidated models?

earthflattener
earthflattener
26 days ago
Reply to  forsyth

What, they are all unvalidated, but some some malign deity has organized the universe such that they all give the same answer? Damn Soros, he has even got God in the game now!

forsyth
forsyth
25 days ago
Reply to  earthflattener

Validation shows that a model is an acceptable or usable representation of a system. None of them so far predict what actually happens or, equivalently, they give such wide CIs that they aren’t usable as “predictions”. In effect, they are currently only consistent with themselves. That suggests that one or more assumptions made either by the model or its input data must be wrong, once implementation errors have been ruled out.

CLARKE PITTS
CLARKE PITTS
23 days ago
Reply to  earthflattener

“Most countries chose to lockdown on the basis of quite different models”
Imperial & Ferguson have been advising other countries too.

R J Barnard
R J Barnard
26 days ago
Reply to  Simon

Students at Imperial who have failed to obtain their degrees might have a claim for refund of their tuition fees on the basis that the standard of education was not what a reasoable person could expect. Some no-win no fee lawyers might be able to advise.

Chris.G
Chris.G
25 days ago
Reply to  Simon

Indemnity for health service activity: England and Wales
https://www.legislation.gov.uk/ukpga/2020/7/section/11/enacted

Simon
Simon
24 days ago
Reply to  Chris.G

Thanks for this. I’m not a lawyer but I’d definitely want to see what a professional legal eagle made of that type of gov.uk statement regarding indemnity – it seems to me to cover doctors and nurses and so on, but not epidemiological modelling, particularly by a Higher Education Institution in a situation where basic everyday life is then cancelled on the basis of their modelling. Much would, I suppose, depend on how this model was “sold” to government. But for all of us out there who run businesses, when you read the small print of your business insurance, you realise that indemnity only goes so far in terms of covering you. If you leave an angle grinder turned on in the street and it maims someone, for example, you can’t simply cite your indemnity. So here, what does this type of statement mean when it comes to Imperial College and their allegedly keystone cops modelling procedures? (It’s a genuine question, bytheway). To me, we have an extraordinary context where epidemiological modelling by a UK HEI hasn’t been subject to peer-review, and where there may – if our anonymous and entirely credible sounding Sue Denim is to be trusted – be major issues with the set-up of the modelling software, and – crucially – the prior knowledge of the team regarding issues with the set-up of the modelling software. I think it would be very interesting to see the British Chamber of Commerce and/or CBI (etc.) come together to get some legal advice regarding what’s happened here.

John
John
10 days ago
Reply to  Chris.G

“The US White House has appointed a coronavirus “Vaccine Czar” from Big Pharma to oversee something dubbed Operation Warp Speed. The goal is to create and produce 300 million doses of a new vaccine to supposedly immunize the entire US population by year-end against COVID-19. To be sure that Big Pharma companies give their all to the medical Manhattan Project, they have been fully indemnified by the US government against liabilities should vaccine recipients die or develop serious disease as a result of the rushed vaccine. The FDA and NIH have waived standard pre-testing on animals in the situation. The US military, according to recent remarks by the US President, is being trained to administer the yet-to-be unveiled vaccine in record time. Surely nothing could go wrong here? …

 

… At a May 15 White House press conference where the President introduced Slaoui as the head of the crash vaccine project, Slaoui stated, “Mr. President, I have very recently seen early data from a clinical trial with a coronavirus vaccine. These data make me feel even more confident that we will be able to deliver a few hundred million doses of vaccine by the end of 2020.”

 

Though he did not say, he was clearly referring to Moderna and its mRNA gene-edited vaccine, the first US vaccine authorized to enter Phase I human trials after the US government gave the company a staggering $483 million of funding to fast-track the COVID-19 vaccine.

 

Vaccine Czar Slaoui is well-placed with regard to Moderna. After leaving GSK from 2017 until he joined the Trump Operation Warp Speed, Slaoui was on the Moderna Board of Directors. He also still holds $10 million worth of Moderna stock options, options likely to soar in value as the Warp Speed zooms forward. This would suggest a glaring conflict of interest with Czar Slaoui, but that’s only the start of this saga, where millions of lives are potentially at threat from a novel inadequately-tested or proven genetically edited vaccine. …

 

… Moderna claims that between January 11, when they got the DNA sequence of the virus from China, and January 13–in just two days–working together with Anthony Fauci’s National Institute of Allergies and Infectious Diseases (NIAID) of NIH, they managed to finalize the sequence for mRNA1273 vaccine against the novel coronavirus. At that point Fauci announced unprecedented plans to run human Phase I trials of the vaccine without prior animal studies. The FDA waived animal pretest requirements. The Moderna mRNA1273 tests were funded by the Gates Foundation-funded Coalition for Epidemic Preparedness Innovations (CEPI). …

 

… All this, despite the evidence of extreme conflicts of interest between NIAID and other agencies of the US Government with Moderna and now-Vaccine Czar and former Moderna director Slaoui, might be treated more lightly, were it not for the fact that Moderna’s mRNA gene-edited vaccine technology is entirely experimental and never before approved for use as a vaccine. The company itself admits as much. It says, “mRNA is an emerging platform… we are still early in the story. Our most advanced vaccine program (CMV) is in Phase 2 clinical testing and we have no approved drugs to date.” …

 

… However, numerous scientists warn that once inside the cell nucleus, mRNA vaccines have a risk of permanently changing a person’s DNA in unpredictable ways. Tony Faudi’s own NIH published a scientific paper regarding the new mRNA vaccine prospects. It read in part, “innate immune sensing of mRNA has also been associated with the inhibition of antigen expression and may negatively affect the immune response. Although the paradoxical effects of innate immune sensing on different formats of mRNA vaccines are incompletely understood, some progress has been made in recent years in elucidating these phenomena.” This is highly experimental science. …

 

… The US government, in a tight-knit circle all tied to Tony Fauci’s NIAID, the Gates Foundation, WHO are moving with not warp, but rather warped human priorities to deliver us a vaccine that no one can assure is in any way safe. Were Moderna so certain it is safe, they should offer to be legally liable for any mRNA damage. They don’t, nor do any vaccine companies. We need to decide if the scale of the worldwide deaths, inflated or not, alleged to be of COVID-19, warrant such a human experiment that could alter our genetics in unpredictable and possibly toxic ways.”

 

18.05.2020 (F. William Engdahl)

The Warp Speed Push for Coronavirus Vaccines

http://www.williamengdahl.com/englishNEO18May2020.php

#GatesQuackVaccine

 

20.05.2020 (Robert F. Kennedy Jr.)

CATASTROPHE: 20% of Human Test Subjects Severely Injured from Gates-Fauci Coronavirus Vaccine by Moderna – Fort Russ

https://www.fort-russ.com/2020/05/catastrophe-20-of-human-test-subjects-severely-injured-from-gates-fauci-coronavirus-vaccine-by-moderna/

g00se
g00se
26 days ago

Thank you for the Intel floating-point paper link. I’d guess that machine derived f.p. errors, if they exist at all in this codebase, are amongst the most minor of problems.

This is a very useful discussion, but what worries me about it is that we are in a sense having our eyesight impaired by the sand thrown by ICL in the form of its refactored code. We should be commenting on THE (original) code. Yes, I know we can’t, as they are refusing to release it, but they must be pressed until they DO release it. Hence my footer:
RELEASE THE ORIGINAL CODE

Sue Denim
Sue Denim
26 days ago
Reply to  g00se

Models like this one are iterative, so small initial inaccuracies in floating point calculations can rapidly be expanded to larger ones as the errors propagate. It’s for this reason that programmers know not to use floats to store monetary amounts: it can appear to work right up until compound interest calculations are performed and suddenly accounts don’t reconcile anymore.

Note that the ICL code uses OpenMP for all its parallelism. The Intel paper observes that:

“Parallel reductions in OpenMP are mandated by the OpenMP directive, and can not be disabled by /fp:precise (-fp-model precise). They are value-unsafe, and remain the responsibility of the programmer.”

In other words there’s no way to make this code reproducible without rewriting every place where parallelism is used to not use OpenMP. Instead parallel reductions need to use a fork/join framework in which every task is given a deterministic label, the results are stored in an array indexed by that label and then summed serially at the end, to ensure the results are always added in the same order. Otherwise the non-associativity of floating point arithmetic can yield “randomly” varying outputs (it’s not random in a statistical sense, it just looks like it).

Norman Armitage
Norman Armitage
24 days ago
Reply to  Sue Denim

So, beware the butterfly effect.

Mike Whittaker
Mike Whittaker
24 days ago
Reply to  Sue Denim

Errors “propagate” only if the calculation is divergent.

Raymond Wong
Raymond Wong
17 days ago
Reply to  Mike Whittaker

@Mike Whitter
I think your comment is on point. Don’t understand why you are down voted.

Feeding system output back to the input is a classic control-feedback system. Either the system will stabilized (convergent) or turn violent (divergent)

earthflattener
earthflattener
26 days ago
Reply to  g00se

Just out of interest would there be if the original code was released? If the public refactored code, run with the same parameters gives answers that are the same as the old code (or even of the same order as the old code), then nothing has changed. Most models, including back of the envelope codes give similar answers to it with the same parameters (or equivalent), so the figures used for the decision do not depend on that particular run of that particular codebase. Now if the refactored code comes back to tell us that covid was only going to kill 2 mice and a hamster, then you would have reason to grumble, but that doesn’t appear to be the case so far.

Miss Liss
Miss Liss
25 days ago
Reply to  earthflattener

Surely we should still be concerned if most models gave similar figures, since this models figures have been totally discredited?

I mean, the accuracy of one approach to parallelism is one kind of concern, but there really does need to be some deeper questions asked if every model was equally wrong. Either they are all poorly coded, or everyone just missed something.

earthflattener
earthflattener
25 days ago
Reply to  Miss Liss

The whole point in mine, and most of the other highly experienced commentators is to say that this model has NOT been discredited – at least not by by Sue Denim’s two articles.

Moreover the original estimates that he gave for what would happen with some lockdown are not too far away from whats happened. Some people have focused only on the very high figures that the model suggested for what would happen if nothing was done. These figures are still not out of the realms of the possible, but luckily something was done – indeed something rather drastic was done. Too drastic? Well, that is a political question – but trying to blame a piece of code that actually works fine seems like a weird way to go.

For those who want to stop the lockdown – just admit you want to stop it. IF you can show that many people have had the illness, so that it is less dangerous than we thought – then we are good to go. It’s the parameters – not the code that may be wrong (and I say may, because I’m not convinced they are so badly wrong)

Russ Nelson
22 days ago
Reply to  earthflattener

The range of results suggested by the models are so broad as to be useless.

Mike Whittaker
Mike Whittaker
24 days ago
Reply to  g00se

Many here appear to be “begging the question ” ie assuming the result they would like to see (clue is in the website domain name)

Jay
Jay
26 days ago

“Someone realised they could unexpectedly recover parts of the deleted history from GitHub…” How could this have been unexpected?

Sue Denim
Sue Denim
26 days ago
Reply to  Jay

GitHub garbage collects unreferenced objects in a repository in a background loop to free up disk space. When there’s no named way to access a change, which after the squash there wasn’t, it’s only a matter of time until the old code is removed. It appears in this case someone was able to locate the refs to the old code before GitHub’s garbage collection system reached it, and then gave the old code a name via a contribution (pull request).

This is rather like how you may sometimes be able to undelete files from your computer, but it really depends on a lot of things and isn’t guaranteed to work.

Mike Whittaker
Mike Whittaker
24 days ago
Reply to  Sue Denim

Git does not removed detached objects , until a “git gc” command is run.

Big Tony
Big Tony
23 days ago
Reply to  Mike Whittaker

But we are talking about Github garbage collection, so how you choose to use “git gc” is not really relevant.

Cynicus Maximus
Cynicus Maximus
26 days ago

I made a comment on the other article about peer review and the scientific method itself requiring transparency in order to work properly. It’s clear to me that, by refusing to release the original version of the source code, ICL is suppressing transparency to some extent. The problem with ICL staff saying that they “do not think it would be particularly helpful to release a second codebase which is functionally the same” is that, while they may know that the second codebase is functionally the same, no one else does. Furthermore, by refusing to release it, they’re denying everyone else the ability to verify for themselves whether the second codebase is functionally the same. This simply flies in the face of transparency, and thereby works against peer review and the scientific method.

dr_t
dr_t
25 days ago

I am not sure you have understood how peer review and the scientific method work. You write a paper (in your area of speciality). You submit the paper to a journal. The referees review your paper and write an opinion. They judge it on the quality and novelty of your science – not your software engineering skills. If they think it’s good and novel, it probably gets published. If they don’t, it probably doesn’t. Nobody – publishing in biology, chemistry, physics, mathematics, epidemiology etc. – ever gives out the software they may or may not have used to obtain their research results for others to “peer” review. Heck, even in computer science, when people write a research paper, about, say some new operating system concept, they don’t publish the code implementing it, let alone all the experimental versions of the software they wrote on the way to obtaining their research results.

earthflattener
earthflattener
25 days ago
Reply to  dr_t

I’m loving this. Cynicus Maximus get tons of upvotes for a piece of populist fluff which is riddled with non-sequiturs (e.g if the corrected mode gives similar results to the old model whose results we already know, but is now more readable, then why would you want the old code…that’s what is meant by functionally the same). Breitbarters would call these upvoters the ‘sheeple’ – people voting but not understanding.
On the other hand, dr_t explains how things actually are – and gets downvoted for describing the world as it is. He didn’t invent it or even say he supported the process.

GerryM
GerryM
22 days ago
Reply to  earthflattener

It is how the world actually is in the sense that most peer reviewers don’t ask to see the software, but. if a paper is published the writers of that paper should make all their working and programs, software etc. available to others to review on request.

bioinformatician
bioinformatician
25 days ago
Reply to  dr_t

It’s not mandatory, but many researchers do release source code. I argue for it on projects I am involved in. Here’s an article from 8 years ago: “Ten Simple Rules for the Open Development of Scientific Software” – so saying “nobody” does it, is clearly false.

https://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1002802

Norman Armitage
Norman Armitage
24 days ago
Reply to  dr_t

the quality and novelty of your science – not your software engineering skills
….. so if I write a program that predicts bees can’t fly, no one will want to check my program?

Mike Whittaker
Mike Whittaker
24 days ago

How well does it agree with observations? Does it make verifiable predictions ?

Russ Nelson
22 days ago
Reply to  dr_t

Publicly funded private code is unacceptable. You take public money, you make your code public. How is that not obvious?

dr_t
dr_t
20 days ago
Reply to  Russ Nelson

How is that not obvious? Because it’s not true. Academia has never worked this way, nor can it work this way. Academics do research, and from time to time they publish their findings in the form of papers they write for this specific purpose. They never publish their internal working, the tools they used to produce those papers, nor do they release all their internal know-how. The deal is they get funded for a certain length of time, in exchange for which they have to produce papers (and fulfil any other agreed obligations, like teaching, administration, etc.), and when it comes to applying for more funding, they either get it, or not, depending on whether the output contained in those papers (and those papers alone – not their internal working) is good enough or not. Beyond those papers, the know-how in the researcher’s head and their internal notes belong to them. If a researcher moves to another academic institution, he may use and develop ideas contained in those internal notes to create and publish new research, under the banner of the new institution (possibly in a different country, paid for by a different taxpayer, or even privately funded). There are good reasons for this. Internal tools and documents and know-how may contain material for publishable new ideas. If they were released, someone else could publish new material based on those ideas and scoop the person who came up with them, before the author has had the time to develop those ideas and publish them. You publish only when you are ready and have enough for a publishable paper. If you don’t have enough for a publishable paper, by circulating your drafts, you are just sharing unripe ideas and risk someone else scooping you, and because there isn’t enough, you won’t get a publication either.

A software company operates on different principles; there, any partially developed program contributed to by a programmer, belongs to the company, and if the programmer is fired, someone else gets to work on that code (on which quite possibly many different people are working on all the time), and the programmer cannot take it with him.

This would simply not work in academia, as every academic is unique, has unique interests, determines their own research topic and agenda, in many fields, there are no teams, and when an academic leaves for another institution, and the academic institution then hires someone else, that other person will work on something completely different, whatever interests them.

Is that clear enough?

All this is obvious to anyone who has ever worked in academia (I used to, but not for a long time now). But I can see how it wouldn’t be obvious to someone who has never worked in that environment.

Laszlo
Laszlo
15 days ago
Reply to  dr_t

I fully agree with the way you describe the situation and partly the conclusion. ” Academia has never worked this way, nor can it work this way.” I believe they should work this way and there are examples when they do. I make a comment on that a bit later. If they do not work that way what confidence in their work we have?
I have worked for academics and the industry. Moving to the industry resulted in understanding how poor the coding in academics work in general. After 10 years I am part time back to academic research again.(We have published our code along with the publications always, btw.) It _does not matter_ if academics work this way or not regarding the result. Bad code is bad code and code writing principles are about not academics or industry – these are about how to program properly. Testing is needed otherwise the code behaviour can not be determined at all. You need good variable names and comments. You need to mark issues and build testing infrastructure for your code. Your code must work as deterministic in isolation as much it is possible. Will it make you code perfect? No. Is it necessary? Absolutely. Otherwise your work prone to grow into just a pile of crap. Many issues in the industry with legacy codes rises exactly from poor practices, but even those poor practices are mostly non comparable to what this code – after rework ! – shows. Legacy code is not only about age – it is about code quality and maintainability. If academic research can not cope up with the technical standards of _programming_ – that are there for reason -, then they can be blamed with reason or they should not write programs.
Software programming is a science on its own right, with a lot of engineering experience and researches on this field. If you write software – you move into this domain and you must first understand its rules and whys.
On the top of it transparency is a requirement in research – or at least should be -, especially because of reproducibility. And transparency is an obligation if publicly funded. If a paper draws conclusion using a piece of software they must make it public. However, as far as I know the ICL prediction paper was not peer reviewed so you can say: they just played around with their crappy code and had an opinion. Well, opinions are cheap, people tend to have many always, even contradictory ones. Anyway, transparency rules do not apply, right? This opinion is not a subject to the rigor of scientific work – more a kind of ‘preliminary report’. The net responsibility in this case is of the decision makers who used this as a reference – making serious decision on a paper that is not evaluated properly is not acceptable. But lack pressure of the scientific community for transparency and evaluation is also a serious issue. Why? The answer is in what you wrote: they work that way. Regardless if they get money from tax payers.

Simon
Simon
24 days ago

Cynicus Maximum, I agree with you. Dr-T’s response is also right as far as it goes. But what’s Dr-T is actually describing correctly is a flawed procedure. Due to its nature as unpaid, lowly, largely unrewarding work, peer-review is only ever surface auditing. The substantive article is what gets reviewed – does it work on its own terms? Does it generate knowledge… on its own terms? Does it engage with the literature? Is it written clearly? And so on and so forth. The problem, however, is that – as Dr_T notes, the philosophical or mathematical foundations from which the substantive article has been produced, remain completely off the radar. That’s a pretty big deal. Who has time to check that type of thing? Imagine the time and unpaid study it would take to unpick the threads in a model that, on the surface, looks okay, but that if you devote enough time and attention to it (or just happen to have a Google Software Engineer to hand to help you!) starts to fall apart? Which raises the rather unsettling idea that what we regard as science is, at least sometimes, undetected fraud/unintentional error. In a case like this, we become slightly more aware of that unsettling fact; but in actual fact, it’s the type of thing that’s happening all the time.

Mike Whittaker
Mike Whittaker
24 days ago
Reply to  Simon

Why just “Google” ?

Why not e.g. a Chartered Software Engineer , C.Eng ….

Peter Tabord
Peter Tabord
23 days ago
Reply to  Mike Whittaker

I’ve been a programmer for 45 years. Eventually rising to head of software engineering. Now semi-retired. Not many people starting back then had degrees. It doesn’t mean we don’t know what we are doing.

Russ Nelson
22 days ago
Reply to  Peter Tabord

I have two and a half degrees in Electrical Engineering, but I’ve mostly done Computer Science my entire professional career. Don’t go overboard on credentialism.

Mike Whittaker
Mike Whittaker
22 days ago
Reply to  Russ Nelson

I was just trying to avoid going overboard on someone being at Google, as compared with someone with a reviewed professional accreditation in software engineering.

Phoenix44
Phoenix44
26 days ago

The issue is whether models like this are remotely useful for modeling something like this at an early stage. I would argue not. This is not a code point but applies to any “bottom up” model that has very few fixed and known parameters. I can model an epidemic on a piece of paper – 40% of people get infected and 0.1% die. 5% will need ICU. Multiply those numbers by the population. If I don’t know the 40% I can’t run a model with hundreds of assumptions to find it out for the simple reason that I don’t know it because I don’t the components of it either. If I did, I would know the 40%. Thus the model simply tells me what the number of people infected would be if ALL the assumptions used to calculate it are right. But if all the assumptions are basically guesses, I might as well just guess the 40% and not bother with the model. And if you have hundreds of unknown assumptions with ten or more plausible values or ranges, then the possible meaningfully different combinations are in the hundreds of millions. So your chances of getting the model right are essentially zero, and your chances of being wrong are essentially 100%.

A much simpler model with the inputs varied and run using a Monte Carlo simulation would have been much more useful, producing a distribution of results, rather than a single (or a few) supposedly accurate forecasts. After all, people putting their own real money at risk use a tool like that. But what could academics learn from those sorts of people?

earthflattener
earthflattener
25 days ago
Reply to  Phoenix44

That is EXACTLY what they did!!!

The reason for the extra complexity in this particular version of the code is that the simple models from the 20s assume a homogeneous population – so struggle to give regional answers, to allow for gradual interventions and particularly the time taken for intervention to work, or allow to plan for hospitalizations etc.

But it is still Monte-Carlo and it varied the parameters within the values that were considered reasonable in mid march (and they don’t appear to have changed too radically since then)

dhogaza
dhogaza
26 days ago

“In fact the second change in the restored history is a fix for a critical error in the random number generator.”

It appears to be a bug introduced in the C++ rewrite, not the original model used to generate the projections used by government.

g00se
g00se
26 days ago
Reply to  dhogaza

>>It appears to be a bug introduced in the C++ rewrite<<
Good for them. More opportunities for them for misdirection, focusing our attention on irrelevant (at this point) bugs.
RELEASE THE ORIGINAL CODE

Cynicus Maximus
Cynicus Maximus
26 days ago
Reply to  dhogaza

I could well be missing something here, but I don’t see how you can say that when the original code has not been made public.

Sue Denim
Sue Denim
26 days ago
Reply to  dhogaza

I think you may have been confused by the cosmetic change to the comment at the top (which was clearly copy/pasted from some much older code years ago without anyone reading it) – the one saying “Pascal to C++”? The code in the file is the original code Ferguson used plus whatever bug fix changes were made to it since, not a from scratch rewrite. The contents have just been moved into a separate file.

The only actual code change in that page is the second one in Rand.h, where the value of Xa2vw is altered.

If you browse through the code you’ll see that although it nominally claims to be C++ it’s in reality C that’s had the file renamed. This is possible because C++ is backwards compatible with C to a large extent. Making it possible to use C++, a much more modern language, is a reasonable way to start improving the code quality although I haven’t seen much evidence of its features being used yet.

Although honestly I’m not sure why they’re trying to save this codebase. How can anyone trust its output ever again?

Another Anon Talking Head
Another Anon Talking Head
26 days ago

Isn’t it interesting that you’re calling for greater transparency, yet you won’t reveal your own identity so we can verify your claims of expertise in this area?

NY
NY
26 days ago

Given the political climate and that the author claims to be employed by Google, I do not blame her at all for remaining anonymous. I remember what happened the last time a Google employee published a politically-incorrect position. It’s a tough situation, it would be better to know the identity, but that does not override the author’s need to protect her career.

Musica 2014
Musica 2014
26 days ago
Reply to  NY

Who knows, politically incorrect is one dangerous thing, but demagogic lobbying is one well rewarded thing, either way, discretion is mandatory.

Although I admit a big company would not like to be linked with the conclusion of the primary article, where based on an analysis of the methodology and implementation flaws the simulation software tool has, the author subtlety recommends “Imperial’s modelling efforts should be reset with a new team that isn’t under Professor Ferguson, and which has a commitment to replicable results with published code from day one. On a personal level, I’d go further and suggest that all academic epidemiology be defunded”.

Mike Whittaker
Mike Whittaker
23 days ago
Reply to  Musica 2014

That’s rather an agenda…

Another Anon Talking Head
Another Anon Talking Head
26 days ago
Reply to  NY

In other words, “transparency for thee, but not for me”. Gotcha.

Lauren
Lauren
25 days ago

The author critiques government policy rather than makes it, so yes, she is perfectly entitled to transparency.

Ignore it if you like, no one is forcing them on you.

dr_t
dr_t
25 days ago
Reply to  Lauren

But the author boasts that her analysis has been viewed by MPs and is calling for questions to be asked in Parliament on the basis of this post. She is therefore trying to influence policy, and possibly is influencing policy. If shadowy figures try to influence MPs and government policy, we normally don’t find this acceptable and require details of who is exercising such influence to be disclosed. If this were a private forum, I would agree with you, but evidently, it has gone beyond that.

Mike Whittaker
Mike Whittaker
23 days ago
Reply to  dr_t

Also, the author is asking for de-funding of all academic epidemiology !

With the dogmatic Daily Mail crew in charge at the moment, rather a contentious agenda.

thelastnameleft
thelastnameleft
25 days ago
Reply to  NY

Author’s claim is this:

——
I worked at Google between 2006 and 2014, where I was a senior software engineer
——

Note how you conflate what is supposed to be a technical discussion about the merits or otherwise of some computer code with “political correctness”. (That of course is the whole point of Sue Denim’s articles – why does this article appear here rather than a properly appropriate venue? Because it isn’t intended for proper discussion, it is designed to muddy the water, to poison the well, to activate public opinion toward a particular political goal etc)

NY
NY
25 days ago

I didn’t catch that she doesn’t work there anymore. My point still stands – she has a right to protect her career.

If you believe politics don’t play a part here you are sorely deluded. Even highly respected, internationally renowned scientists who have given an opinion that doesn’t conform with the policy of lockdown and social distancing have been smeared and marginalized. I’ll give you my view – the goal of governments and some of these elite institutions isn’t science or the pursuit of truth, nor is it public protection. The goal is to push a narrative that justifies sweeping confiscations of power, personal liberty, and economic independence, and a fundamental restructuring of the social order.

Finally, I’m not sure what you consider to be an appropriate venue? Just because you don’t agree with this site’s message doesn’t invalidate the content of this article. I also can’t imagine it would be easy to get this article published by an outlet that you would consider credible. I remember how the media absolutely terrorized the public with Ferguson’s numbers. Why would they now publish information that shows his model to be a complete mess?

tim
tim
19 days ago

the editor of Lockdown sceptics has seen the credentials I suggest; and this sounds like a plea to get someone sacked for being public spirited: this isn’t Twitter is it?

tlitb
tlitb
26 days ago

“Sadly it shows that Imperial have been making some false statements.

ICL staff claimed the released and original code are “essentially the same functionally”, which is why they “do not think it would be particularly helpful to release a second codebase which is functionally the same”.”

This seems unfair. Surely that “essentially the same functionally” statement is not referring to anything about this GitHub refactor, but rather the code “used for Report 9” which is the old C code no one has seen yet.

Wes Hinsley says:
“We do not think it would be particularly helpful to release a second codebase which is functionally the same, but in one undocumented C file”

This older refactor code you have found may indeed have errors in it but that is to be expected, it’s an older iteration of this process they are working on to be able to getting refactored code working the same as the unseen C file. I don’t think it tell’s more about that file.

Sue Denim
Sue Denim
26 days ago
Reply to  tlitb

I think there may be some confusion about what “refactoring” means. The code published on GitHub is not a rewrite from scratch. It’s a direct continuation of the codebase used for report 9. It’s full of code that is directly the same as in would have been in that report, and if you (can) look you’ll see bits of code and notes by the authors that stretch back years. What people are asking for is the code as it existed exactly at the moment the report 9 numbers were produced – without any subsequent changes.

There’s a comment above that seems to be under a similar confusion. C++ is called C++ because it’s a superset of C. You can normally convert a C program to a C++ program just by renaming the file and making a few cosmetic changes. The “unseen C code” thus isn’t really unseen, it’s just that we’re seeing it with a variety of alterations made on top. Although some of these alterations are critical to the results, they’re overall quite small – you can’t extensively modify 15,000 lines of code in just a few weeks, after all.

tlitb
tlitb
26 days ago
Reply to  Sue Denim

Thanks for the reply. I think they should release the original file too. However I’m not sure how many of the errors mentioned can all be assumed to have existed in the original code, for instance, the random number one seems to have been introduced recently by somehow losing the first numeral. The correct number is found in a 30 year old paper. Bad cut and paste?

Your point about floating point accumulating errors that are not truly random is very interesting.

Sue Denim
Sue Denim
26 days ago
Reply to  tlitb

Why do you think it was introduced recently? It’s the second change in the squashed history. Do you have evidence it was introduced between March 16th and April 1st.

tlitb
tlitb
26 days ago
Reply to  Sue Denim

“Do you have evidence it was introduced between March 16th and April 1st.”

No I don’t. but the fix is on 25 Mar and the comment by Ian Lynagh is:

“RNG: Fix a transcription error

The PDF of the paper says 784306273, but we had 84306273.

Wolfram alpha confirms that
40692 ^ (2 ^ 50) mod 2147483399 == 784306273
which gives me confidence that this is an error in our code, not the
PDF.”

When he said “our code” I assumed he was talking about the refactor. Or is he working as part of the Imperial team? His linked-in says he’s a software guy associated with Semmle Ltd and Oxford University

earthflattener
earthflattener
26 days ago
Reply to  tlitb

The other comment by Ian Lynagh mentions that in the original usage of the code the second pair of seeds was used and so the answers are unaffected… ‘we’ seems to be interpreted to Ian and the other who are working on it.

There were a couple of places where we use
P.newseed1 = (int)(ranf() * 1e8);
to make new seeds, but Rand’s setall already implements RNG splitting, so we now use that instead.

This needed a few knock-on changes in the places where we seed the RNG, as if we wanted to reuse the seed values then we now need to store them.

Finally, we now also split the seeds used for the setup phase, and reinitialise the RNG with them after the network has been created or loaded. It looks like historically the second pair of seeds had been used at this point, to make the runs identical regardless of how the network was made, but that this had been changed when seed-resetting was implemented. Now the determinism is back, fixing #116.

Brian Sides
Brian Sides
25 days ago
Reply to  earthflattener

P.newseed1 = (int)(ranf() * 1e8)

So what does the P stand for . Do they have something against meaningful variable names.
the (int) casts the result from the returned floating point value to an unsigned integer the range of the integer typically holds values from -32768 to 32767 but the range can be different depending on system.
ranf is a Fortran function Generates a random number between 0.0 and RAND_MAX. ranf is supported through the Intel Short Vector Math Library (SVML).
RAND_MAX this macro expands to an integral constant expression whose value is the maximum value returned by the rand function.
This value is library-dependent, but is guaranteed to be at least 32767 on any standard library implementation.

Plenty of room in the above for different library’s , compilers, systems to produce different results.

Computers have difficulty producing truly random numbers.
I once wrote a lottery program for the Isle of White. I remember I had to produce a 32 bit random number.
We submitted a program to a company in Norway once using our standard random number generator method. But they rejected the method we had used. We had to rewrite it to meet there specifications.

earthflattener
earthflattener
25 days ago
Reply to  Brian Sides

Indeed. But imagine that you didn’t sell it to Norway? Would you have had to change it then? This is in-house code – why should they have cared originally about portability?

Brian Sides
Brian Sides
25 days ago
Reply to  earthflattener

We thought the Norway objection was very theoretical. But we wanted to sell them our programme and they wanted us to use the random number generator the way they suggested. So we rewrote the code to make them happy and get the sale. This was back in 1999. I also wrote some code at this time to avoid the millennium bug.

earthflattener
earthflattener
25 days ago
Reply to  Brian Sides

I upvoted you, because I feel your pain 🙂
Your point about the RNG is well made, but portability was not the objective of the code. So it doesn’t support the idea that the code was utterly untrustworthy – though I don’t think you were trying to say that anyhow

Philip Oakley
Philip Oakley
23 days ago
Reply to  tlitb

It is quite rare for any typical random number generator to have the ability to even fully shuffle a deck of cards (52! combinations, 228 bits). Overall, the model simulations are already well known, so this nit-picking and whatabouterry does not advance the wider discussions about the wider policy and strategies, and just focusses on local tactics (the winning of fights but losing battles and wars).

For comparison, chose a percentage needed for herd immunity, a percentage death rate for those who have been infected, and see what that implies as the expected number of deaths before it (covid) becomes like a ‘normal (herd immunity) flu’ (UK has ~67m people)…

zebedee
zebedee
19 days ago
Reply to  Sue Denim

To be pedantic C++ was a superset of C. e.g. C99 has complex number types, C++ has std::complex

Mike Whittaker
Mike Whittaker
24 days ago
Reply to  tlitb

“Functionally identical” to me implies that at least all of the unit tests pass in both cases.

earthflattener
earthflattener
26 days ago

You were so happy quoting John Carnack in your first article. If he says there is a problem, then it must be so…blah, blah. Yet here are his comments now (from https://github.com/mrc-ide/covid-sim/issues/144#issuecomment-625151694)

Seeing as you have quoted John Carmack’s twitter, I hope that you find his comments encouraging when he writes, “it turned out that it fared a lot better going through the gauntlet of code analysis tools I hit it with than a lot of more modern code. There is something to be said for straightforward C code. Bugs were found and fixed, but generally in paths that weren’t enabled or hit.”

Not a bad recommendation, huh?

If the author had integrity, they would have posted their comment on a software site. Instead they chose to do it on a site with a far right political bent. The objective is clearly to deceive rather than to provide objective analysis.

The author states
“The deeper question here is whether Imperial College administrators have any institutional awareness of how out of control this department has become, and whether they care. If not, why not? Does the title “Professor at Imperial” mean anything at all, or is the respect it currently garners just groupthink? ”

I suspect you would find answers to why ‘she’ has chosen to write this in her personal history. Embittered failed academic? In an open marriage with with someone who has a lover at Imperial? The options are endless – and just as weak a speculation as her critique of the code.

Finally, if the author is to make any kind of reasonable point, it is insufficient to give examples of numbers from single runs, when we already know there was a bug in saved state (though apparently not in the original usage of the code). At the very least, give us the result of the ensemble solution, so that we can compare apples with apples.

Tarquin
Tarquin
26 days ago
Reply to  earthflattener

You lost me at “far-right”..

earthflattener
earthflattener
26 days ago
Reply to  Tarquin

To be fair, you were probably only here to gather ‘evidence’

Sue Denim
Sue Denim
26 days ago
Reply to  earthflattener

I don’t plan to get into a long back and forth with you based on your posts to the other thread, but I’ll reply once.

John Carmack (not Carnack) is by all accounts a tremendously nice guy. Despite his reputation, his opinion about this program’s quality is wrong. It was and still is riddled with severe bugs. I don’t know why he is defending a codebase with broken RNGs, broken shuffles, reads of uninitialised variables and memory, non-replicable outputs, no unit tests etc. As he admitted, some of the tropes about academic code are true. I only see one (quite small) change from him in the history, so perhaps he just didn’t spend much time looking at it.

Fortunately, anyone who either is a programmer or knows one doesn’t have to play the game of asking which out of Microsoft or Google or Imperial is a more “expert” employer. They can evaluate the claims directly, as evidence for every statement about the code is provided. If my “objective is clearly to deceive” then providing so many links directly to evidence is a strange way to do it.

Your fantasies about being able to discredit these facts via personal smears might indicate why it’s better to remain anonymous. But just to satisfy your needs – I have no connections with Imperial, or its staff, or British academia whatsoever, and never have. You wouldn’t know my name even if it was revealed, as I’m nobody famous.

Finally, as I’ve already explained above, averaging the outputs of buggy code yields an invalid answer. Calling it an “ensemble model” doesn’t change that.

earthflattener
earthflattener
26 days ago
Reply to  Sue Denim

Well, I’m sorry if it took a bit of character defamation to get you here, but lets be clear, you have done your very best to smear Ferguson, Imperial Epidemiology and basically all mathematical modelers who program for the sake of pushing an agenda.

Now can you answer direct questions?
1) Show that imperial usage of the mode is incorrect? You keep showing that a single realization is different. All of us are saying that doesn’t matter because how the code is used. But you persist with this and your paragraph ”Effect of bug changes” is not clear about what exactly you are testing. See my first paragraph below for why you need to answer this.
2) Did you establish that there WAS significant underflow or overflow in a typical run of the ensemble, or did you just provide a helpful link to what to do if such a situation were to arise?
3) The random shuffle bug leads to addressing bugs – very likely to lead to a crash. Most unlikely not to have been checked in the original. Are you referring to the original code here?

Your main argument is that a bug which is about storage of a seed in a saved state can somehow invalidate the usage that Imperial made of this code. For a start, the evidence points to the fact that this bug did not exist in the version that ran. Secondly, running a suite of runs to produce the ensemble does NOT invoke the bug, so the answer will not be affected by the seed issue – even IF it did exist in the version that they ran. That is the major issue. Average a series of outputs of consistent code does converge to the ensemble result. You have not shown, or begun to show that this is not the case. So your example of rerunning the model with facemasks is wrong. You need to compare ensemble results not run results.

You are no doubt correct that in an ideal world, code ‘should’ be managed by professionals. But, it’s horses for courses. I write code for algorithm development – so something that was n**2 becomes n*log(n) or whatever. It’s generally pretty bad code even with uninitialized variables – fine for my compiler. Is it portable. No. Is it robust to user error in a fairly rigid protocol for usage. No. So before it goes commercial it spends a loooong time with some brilliant software writers. But they look to ensure that my numerical work is replicable – you see nothing wrong with the results though the code, if used incorrectly, is shaky. If the code was only used by 5-10 experts who understood the limitations – no issue! That is the nature of scientific code. It is a nonsense to say all academic or research or specialized code needs to conform to commercial development requirements. Nothing would be done. It’s too slow for experimental or original work.

Tom Morgan
Tom Morgan
26 days ago
Reply to  earthflattener

‘I write code for algorithm development – so something that was n**2 becomes n*log(n) or whatever. ‘
Ummm … n**2 is in no way the same as n*log(n) (or whatever) – what is it you are trying to say?

dr_t
dr_t
25 days ago
Reply to  Tom Morgan

What he is saying is obvious. When he is doing research and developing novel academic code for research purposes, which code is constantly changing, maybe being thrown away and rewritten from scratch, he does not necessarily expend resources to optimize or document the code or write it in a way in which you would write code intended for an industrial strength release or a contribution to a professional software development project. He doesn’t e.g. optimize algorithms with a running time of n**2 even though a n*log(n) algorithm is known. He writes a program that satisfies the purpose at hand. If the code then needs to be made to run faster, it is optimized. If it needs to be handed over to someone else, it needs to be documented. Etc. But you don’t do the work that’s not necessary to achieve your purpose, because that is a waste of limited resources. I don’t know anyone sensible who does things differently.

FactsNotFaces
FactsNotFaces
25 days ago
Reply to  dr_t

Except any professional programmer can tell you that today’s throw away code that you’ll replace with something proper later ends up sitting there for a long time after. Indeed, this code goes back fifteen years. We have software development practices and methodologies specifically to mitigate this problem. Imperial College obviously do not use them. You talk about writing something that “satisfies the purpose at hand”. A decade and a half of growing cruft to become a 15,000 line C file, is not such a scenario. It’s bad practice and liable to introduce bugs – which we have seen was the case.

They need to release the original code that they actually used, not this.

dr_t
dr_t
25 days ago
Reply to  FactsNotFaces

We all know that. Is it really so difficult to understand that Ferguson is not a “professional programmer” but a scientist doing research in epidemiological modeling, and Imperial is a university, not a software company?

Tom Welsh
Tom Welsh
25 days ago
Reply to  dr_t

Is it really so difficult to understand that software written by one or more amateur programmers should not be used to shut down an entire nation for months on end?

If Imperial had no competent professional programmers – which seems to have been the case – surely they could have hired some before using the results of their highly dubious program to bring about the greatest destruction of British liberty ever?

dr_t
dr_t
24 days ago
Reply to  Tom Welsh

Except none of that is true.

The nation was not shut down because of any piece of software. It was shut down because there is a deadly virus on the loose and the ‘herd cull’ policy the government was previously pursuing and their total failure to act, by e.g. sealing the borders, was alarming to people, had no public support and was unsustainable. Blame the Chinese Communist Party, not Ferguson. Most of us worked out that this was a very dangerous virus well before Ferguson had anything to say, and lots of independent predictions predict the same: many many more deaths if the epidemic is allowed to continue on its exponential growth trajectory. More than 3/4 of the population want the lockdown to continue, and many people would rebel against its lifting, would not be willing to work other than from home, and would not send their kids back to school. Virtually every country in the world has adopted epidemic mitigation measures, most countries – and all the successful ones – have adopted much more effective ones and did so much earlier than the UK has done. Most of them well before Ferguson had ever spoken in public. You don’t seriously think that all of us who worked this out for ourselves and all the governments in the world, have done so because of Ferguson’s program?

It seems that you don’t understand how universities work. Individual researchers do their research work independently, they are specialists in their own fields, not in computer science or in software engineering, unless they are in the computer science department. I am sure plenty of Imperial’s computer science professors are competent programmers. But a researcher does not have a software engineering team working for him, nor is he able to hire one. You have a typical large corporate attitude and mentality and criticizing people who are, and can only be, one man bands on the grounds that they have not expanded their software development department enough.

You also don’t seem to understand much about liberty. The limitations imposed are not unprecedented; you can rest assured that the restrictions imposed were much greater during the time of the Black Death – people were physically bolted into their houses, just as they are today in China.

How much liberty is there for anyone when streets are infested with infected people spreading deadly germs around, threatening your life and limb at every opportunity? I have been arguing, and David Friedman, a well-known anarcho-capitalist, recently said the same, that in a truly free world, private law would have evolved so that those who impose involuntary negative externalities on others by spreading infection, would have to face consequences – be it by having to pay restitution, by facing criminal charges for assault/grievous bodily harm/attempted murder/murder/accidental killing/etc. whatever applied. Unlike now, where people can violate others’ life and limb without consequence, if there were such incentives, people would stay at home anyway, lest they risked going to prison for killing someone by infecting them. Also, you don’t seem to comprehend that you have already lost your liberty a long time ago. The government owns the roads, public transport, healthcare, and other infrastructure. If you are a true believer in liberty, this is what you should be questioning, and you should also realize that an owner of something can restrict the use of that something as they see fit, otherwise they are not an owner. In particular, the owner of the roads can prohibit anyone else using them, which in effect means locking people down in their homes. In a fully free market, private owners of infrastructure who faced potential liability for spreading infection would also impose restrictions on the use of their infrastructure by third parties, lest they face consequences for participating in the injury and the killing of others.

What you are proposing, i.e. lifting the lockdown and letting a bunch of infected people on the loose without facing any consequences for their disease spreading, is both nuts and it also isn’t liberty, at all.

Barney McGrew
Barney McGrew
24 days ago
Reply to  dr_t

“…those who impose involuntary negative externalities on others by spreading infection, would have to face consequences”

I’d be willing to bet that in the past you have knowingly left your house during a flu epidemic and gone to shops, workplace, pub. I’d also bet that you’ve “struggled in” to work when suffering from a cold – which could have been the early stages of flu.

MURDERER!!

dr_t
dr_t
24 days ago
Reply to  Barney McGrew

And you would lose your bet. When I am sick, I stay at home and cancel my meetings. It’s called “being polite”.

GerryM
GerryM
24 days ago
Reply to  dr_t

The nation was lockdown because of a forecast from an extremely dodgy program that there would be 500,000 deaths if we didn’t do so, It is completely disingenuous to pretend that the government had a choice once this number got into the public domain. I agree that using a piece of software to solve an academic problem doesn’t require the full monty software engineering care that goes into software for sale developments, but if you are creating a piece of software to forecast the likely paths of pandemics then you damn well should go through the full monty software development process.

Mike Whittaker
Mike Whittaker
22 days ago
Reply to  dr_t

Also, computer science is not the same as software engineering !

malf
malf
22 days ago
Reply to  dr_t

English Law clearly is not your academic jam. It is mine, so I will tell you what Liberty means:

“Freedom is the natural faculty of doing what each person please to do according to his will, except what is prohibited to him of right or by force. Servitude, on the other hand may be said to be the contrary, as if any person contrary to freedom should be bound upon a covenant to do something, or not to do it.” (Henrici de Bracton de Lebigus et Consuetudinibus Angliae, Sir Travers Twiss, Q.C., D.C.L., trans. London: 1878, p. 369.)

So, what is prohibited of right, that is of natural right, because liberty is a natural faculty. Acts of Parliament, Orders of Health officers, etc. are servitudes. What you are supporting is called involuntary servitude. Also, you are hysterically exaggerating the deadly nature of the virus—to people under 50 it poses essentially no risk. In my Province, British Columbia, the average age of death has been 85, and social distancing and lockdown measures are not responsible for that.

What you’re showing is why the feudal University system is obsolete, that’s all. It doesn’t deserve to exist, it was never anything more than part of the feudal hierarchy, the highest degree Oxford used to issue was Doctor of Divinity, followed by Doctor of Civil Law, followed by Doctor of Medicine. So Divinity and Law outrank Medicine—to this day, Medicine is Governed by Law, not Law Governed by Medicine.

If we’re going to have a technocracy like you seem to envision, we should at least have first-rate technocrats, not just prats whose claim to fame was going “yes boss” to their advisors in Grad school, realizing that not rocking the boat was the surest bath to tenure and decades of fun goofing off on campus.

CDH
CDH
20 days ago
Reply to  dr_t

Most of what you written is utter tripe!
The nation was shutdown on the basis of a report that used data from this software. Do you have any scientific evidence that the lockdown has made the slight difference to the progress of the disease? That would be observable, testable, repeatable and falsifiable evidence?
Epidemiological data is not evidence. And watch out for the post hoc, ergo propter hoc fallacy, you would want to fall in that trap.
You seem to think that politicians are intelligent, most are not.
3/4 of the population want lockdown to continue because they have been scared shi1less by the ridiculous propaganda pumped out by the Government and PANIC TV, the BBC.
There has been no consistency in the action taken by countries and the resulting number of deaths
The goal of baffling the gullible has certainly been achieved
It is almost universally believed, without evidence, that home detention makes us safer.
My favourite quote from the Washington Times
“The response to the coronavirus is hyped. And in time, this hype will be revealed as politically hoaxed.
In fact, COVID-19 will go down as one of the political world’s biggest, most shamefully overblown, overhyped, overly and irrationally inflated and outright deceptively flawed responses to a health matter in American history, one that was carried largely on the lips of medical professionals who have no business running a national economy or government.”

Mike Whittaker
Mike Whittaker
24 days ago
Reply to  dr_t

Imperial College (whose mainframe was the first computer I ever used !) has a computer science department. There, they will practice software engineering, which is the application of engineering principles of robustness, repeatability and reliability, accountability and governance, to software.

malf
malf
22 days ago
Reply to  dr_t

The basic mythology of the University system is that it is always better than the private sector—everyone who has worked in both knows this is entirely false, really, the University system only appears “smarter” or “better” in fields where you need a degree to practice, e.g. law, medicine, architecture, engineering. Everything where you can just hang out a shingle, e.g. software development, you get better quality when market forces are involved. Tenure does not produce quality anything, but especially not software.

You’re a good example of the impudent arrogance of the University set—you are basically saying University scientists aren’t going to be “professional” programmers, but we’re supposed to subsidize them to the tune of billions of dollars every year (globally) and use their conclusions to drive public policy? Why?

How can you out of one side of your mouth say ‘we can’t expect university researchers to be professional programmers, we can’t expect Universities to produce software as good as a software company,’ and out of the other side say ‘but we should trust their software and use it to drive policy’? Why should we subsidize second (or lower) tier intellectuals? Why should we give them this much power? The 20th century computer/internet revolution shows precisely that we _do not need_ these universities anymore.

Rob
Rob
18 days ago
Reply to  malf

If your academic discipline involves mathematical modelling, it requires software development. If your academic discipline requires software development, you should be trained in software development. If the work of your academic discipline is going to be used to inform public policy, the public need to feel confident that your work is at a high enough standard.

(I have a background in physics, mathematical modelling and software development and I work as a software developer.)

earthflattener
earthflattener
25 days ago
Reply to  FactsNotFaces

Why do they need to release the original exactly? What is the agenda behind such a demand?
They have released a version that is a bit tidied up. It seems to give the same answer as the previous one. So, what are you going to learn exactly? You know, you get the same answer (more or less) with a simple SEIR model. All the scientific teams arrived at similar conclusions using somewhat different algorithms and entirely different codebase. What they share in common are a similar set of parameters. If something is dreadfully wrong with the predictions (and there is no sign of that yet), you will find the problem with those, not by trying to prove that a couple of fairly irrelevant bugs are the whole problem.

Barney McGrew
Barney McGrew
24 days ago
Reply to  earthflattener

You are unknowingly stating the problem: the whole thing is a circularity, with guesses feeding models whose outputs feed into research whose output feeds government policy, which then feeds back into the models and so on. It is groupthink.

youreamoron
youreamoron
13 days ago
Reply to  Barney McGrew

It’s actually called statistics.

duncanpt
duncanpt
24 days ago
Reply to  FactsNotFaces

Indeed. I remember hearing once in the computer audit days of my career that NatWest Bank’s main accounting software still had ppounds-shillings and pence somewhere at its core. Throw-away code is rarely thrown away: most programmers are essentially packrats at heart.

Mike Whittaker
Mike Whittaker
22 days ago
Reply to  Tom Morgan

As in e.g. sort algorithms,
O(n log n) is better than O( n^2) for increasing n.

Depends on the multiplicative factor in real life though!

Lewian
Lewian
26 days ago
Reply to  earthflattener

Being an academic modeller and coder myself (and finding the place where this was posted dodgy as well), I have some understanding for your position. However from my programming experience I know that you need to be able to reproduce runs with fixed seeds if you ever want to be sure you can find bugs and correct them. I know a good deal about ensemble models but I don’t buy the defence that you don’t need reproducible runs because things average out and the overall ensemble will be fine anyway. My impression is that if you can’t reproduce your runs, you can’t know what you’re averaging and whether it’s any reliable. Nothing you have written up to now has convinced me otherwise, and that’s a pity because I’d like to be convinced.

earthflattener
earthflattener
25 days ago
Reply to  Lewian

Fully agree with you Lewian. There is currently a bug, and it should be fixed (actually, that was last week – it is fixed). The question is though, what is the nature of the bug, when did it happen and most importantly, did it affect the original run. As seen from the quote below, there was no issue originally. It did reproduce runs when used on the original code so there was no issue. The author of the critique knows this, but continues to insist that the code is completely flawed due to a bug that did not affect the original, so one might think that the point being made is not about trying to improve the science.

The proof that the issue did not affect results is in the quote from one of the guys who are porting the code to c++ and improving style. Here is is quote

“There were a couple of places where we use
P.newseed1 = (int)(ranf() * 1e8);
to make new seeds, but Rand’s setall already implements RNG splitting, so we now use that instead.

This needed a few knock-on changes in the places where we seed the RNG, as if we wanted to reuse the seed values then we now need to store them.

Finally, we now also split the seeds used for the setup phase, and reinitialise the RNG with them after the network has been created or loaded. It looks like historically the second pair of seeds had been used at this point, to make the runs identical regardless of how the network was made, but that this had been changed when seed-resetting was implemented. Now the determinism is back, fixing #116.”

You can find the quote here https://github.com/mrc-ide/covid-sim/pull/121
I think this puts a different complexion on it, do you agree?

Lewian
Lewian
25 days ago
Reply to  earthflattener

Fair enough. It’s something of an indication, but I don’t think we can know that originally things were fine. Without having seen the original code that was actually run, how can anyone tell? It *may* have been OK (at least in this respect), or not.

Barney McGrew
Barney McGrew
24 days ago
Reply to  Lewian

“and finding the place where this was posted dodgy as well”

What does that mean?

It used to be respectable to question government mistakes, propaganda and authoritarianistic tendencies. I used to believe that Britain would never fall into the abyss because there was this innate, intelligent rebelliousness. But now..? It’s ‘dodgy’ if you question what the government-media symbiosis churns out.

Rob
Rob
18 days ago
Reply to  Barney McGrew

Nowadays if you criticise any ‘mainstream’ position you are automatically ‘far-right’ or a ‘conspiracy theorist’.

thelastnameleft
thelastnameleft
25 days ago
Reply to  earthflattener

17 hours and no reply from “Sue Denim”.

Note, no other replies from Sue Denim addressing specific technical points.

From that one can see why these articles have been published here ( a highly partisan, political venue) rather than a more appropriate technical venue where the actual issues (if any) might be properly addressed.

What is the audience for this ‘critique of academic code’? It’s neither academic or technical. And that’s the level of the ‘critique’. Says it all.

NY
NY
26 days ago
Reply to  earthflattener

Why get personal like that? Could it be that perhaps someone with experience saw the code that had been used to justify a drastic, seismic, unbelievably damaging policy for hundreds of millions of people (whether you believe the tradeoff is worth it or not), and decided to write an article about it? Why should the site of publishing be relevant. You say it is far right – now I’m new here so I can’t say, but since when is opposing a totalitarian regime of restrictions on people’s lives “far right”? I would even call such a position liberal. But this is irrelevant to this discussion. All of the points made here are regarding flaws in a software model.

earthflattener
earthflattener
26 days ago
Reply to  NY

The attempt here is to smear the model. It is being done by pointing at a handful of bugs which have so far proved irrelevant to the actual code usage by the modelling team. If it were to offer constructive criticism about coding issues – and there is always scope for that, it would be better placed in somewhere more appropriate than Toby Young’s site (don’t forget to make a donation to ‘the cause’). I don’t have a problem with someone arguing that the lockdown is wrong.

Young himself says “Even if we accept the statistical modelling of Dr Neil Ferguson’s team at Imperial College, which I’ll come to in a minute, spending £350 billion to prolong the lives of a few hundred thousand mostly elderly people is an irresponsible use of taxpayer’s money.” Now I don’t even have a problem with people saying that. There is always a tradeoff in medicine between money spend and lives saved.

What I object to is trying to smear the science to help the underlying view that they don’t believe a lockdown is worth it to save those lives. This is a trope of many on the right (note I wasn’t saying nazi or fascist – just that they are in the far right side of politics) – undermine science so you push an economic agenda. I object to the attempt to pull the wool over people’s eyes for the sake of any political cause, left or right.

Brian Sides
Brian Sides
26 days ago
Reply to  earthflattener

NASA lost its $125-million Mars Climate Orbiter because spacecraft engineers failed to convert from English to metric measurements when exchanging vital data before the craft was launched,

As we do not have the original code or test results we can not know if the code is faulty,
But from the information supplied it would seem that it could not have passed reliability testing or met the most basic of standards.
The code that has been made available appears to be in the development stage.
A number of bug fixes have been posted.

The model uses the code . So if the code is faulty the results may also be faulty.
As for the method of taking available data then making assumptions to form a prediction.
Even if the code works taking drastic action based on such questionable predictions is a risky strategy. If the prediction was a possible 50,000 deaths without lock down as apposed to 500,000 deaths a different action may have been taken.

VanFa
VanFa
26 days ago
Reply to  Brian Sides

Well this a problem of the US, as the only country in the world, being relentlessly ignorant and using this stupid overcome measurements without any sensible argument except “tradition”.

As stated above, no country is stupid enough to just rely ON ONE SINGLE researcher. There are expert teams in many countries, so e.g. the team around Drosten in Germany specialised in Corona viruses, invented the first test for COVID-19. Have they come – independently – to different conclusions? Have not heard about it.

NY
NY
26 days ago
Reply to  VanFa

There are plenty of different conclusions from respected scientists and institutions all around the world. I could throw a few prominent names out there of who believe lockdowns are the wrong way to go – John Ioannidis, Knut Wittkowsky, Hedrick Streeck, Scott Atlas, Sucharit Bakhdi, Pietro Vernazza, Detlef Krüger, Johan Giesecke, Carl Heneghan… I could go on. Some claim it is no more lethal than the flu. Some claim it is somewhat more lethal. Some even claim it is less. Resoundingly though, they believe the lockdowns are doing more harm than good. Shouldn’t the goal of any policy be maximizing total benefit, and, by corollary, minimizing total risk?

earthflattener
earthflattener
25 days ago
Reply to  NY

Different matter though NY. The code didn’t suggest what form of lockdown should be made..it simply showed a set of possible scenarios.
The issue in this thread is about the code, and whether some minor bugs in the code meant that you can’t trust anything that Imperial say on the matter. That, is completely unproved.
Doesn’t mean that we should not be looking at how to get out of lockdown, but these kind of models can help. The better the data, the less uncertainty in the estimates. We have more data now, though still not enough. More data is what we should be asking for, not some stupid post-mortem on a C routine.

The Phantom
20 days ago
Reply to  earthflattener

“The issue in this thread is about the code, and whether some minor bugs in the code meant that you can’t trust anything that Imperial say on the matter.”

This code, even in its present semi-patched form, gives ‘non-deterministic outputs’ and different outputs depending on which computer it is run on. Meaning the previous, un-patched version was worse.

As the author notes, the variance in output predictions was larger than the maximum emergency efforts of the Nightingale Hospital.

Meaning, its not useful to a policy maker.

You however are adamantly defending the use of this code as suitable for policy, and smearing the author as a “right winger” for daring to call it into question.

Two questions for you: first of all how you come to the conclusion that “right winger” is a bad thing to be deplored? Second, have you noticed how many down-votes you’re getting? No one is buying it.

But really, the biggest question is how do you think it reasonable that academics not release their source code on request from other researchers? Its malpractice and borders on malfeasance.

This is a PANDEMIC EMERGENCY. If the model is wrong (and measurements made in Reality are suggesting that it is -wildly- wrong) then everyone will benefit from knowing -why- it is wrong and what can be done to fix it. This is not the last pandemic there will ever be, I’m sure.

If it turns out that the model is a fraud and the researches fraudsters, we will all benefit from knowing that as well. Currently I’m leaning more in that direction the longer they refuse to release their code.

Mike Whittaker
Mike Whittaker
24 days ago
Reply to  NY

Sounds a bit like “the end justifies the means” …

earthflattener
earthflattener
26 days ago
Reply to  Brian Sides

If professional software was perfect, I wouldn’t get the blue screen of death from time to time.
The 500,000 is not absurd for a no-action outcome even still. Just scale up the deaths in NYC to the US as a whole and you get about 450,000 (or will do in about 2 weeks time). Since there is at most 25% of people affected in NYC, then unrestricted, that figure could have been 1.5 million.
The model didn’t predicate how things should be done, use of police etc, it just shows that fairly robust measures appear to be needed.
Anyhow, my goal is not to defend the actual numbers from the code, just to counter false attacks

ahamilton
ahamilton
26 days ago
Reply to  earthflattener

Lmao. just “scale up” from the worst affected state with the most densely populated city, assuming uniform effect across a huge sparsely populated country, then multiply that by 4 and the numbers are actually quite close.

earthflattener
earthflattener
25 days ago
Reply to  ahamilton

Sorry about you laughing your ass off. I guess that’s the last we will hear from you so – now that your brain is gone.
You see, a virus, left unchecked, will spread through out the country. That’s why people do get flu in the countryside. The back of the envelope solution that I mention works for large time in the case of unimpeded spread – which was the worst case scenario that was originally modeled.

GerryM
GerryM
24 days ago
Reply to  ahamilton

We are getting an insight into the mindset of the modellers from earthflattener. Take the worst case, regardless of other factors, and multiply it up. That’s how the model works?

earthflattener
earthflattener
24 days ago
Reply to  GerryM

That is how the worst case scenario works… which is what the original figures were about, i.e. What would happen if nothing was done? The information we have is still coherent with that having been a possible outcome.
I think I’m begining to agree that those original estimates should not have been released, as it is clear from most of the people passing through here, that apart from the mathematically literate. people completely misunderstand it. The sad thing is then we must depend more on controlling information to people – which is what you guys seem to be against – but you throw a complete strop when presented with it.

Mike Whittaker
Mike Whittaker
22 days ago
Reply to  GerryM

It’s called a sanity check. At school, kids are now taught to use an estimate, to check their detailed solutions are in a reasonable range.

Brian Sides
Brian Sides
25 days ago
Reply to  earthflattener

Most of the deaths in the US have just been in two areas.
When the towers collapsed in 9/11 all the dust containing asbestos and much else covered an area of NYC. Many rescuers and fire men have died from the exposure to the dust. Many residence were effected. I wonder if this is why there is such a high death rate in NYC. Other areas with high air pollution like Wuhan parts of Italy where they have an aged population who smoke a lot have had high death rates.
The trouble with extrapolating from the worst hit area to the whole of the US is it will not give you an accurate prediction.
No one can say what would have happened if no lock down had occurred.
In Japan as they did not want the Olympics postponed they did not lock down.
Only after the Olympics was postponed did Japan start taking precautions and reporting deaths. In many Asian countries it common to see people wearing face masks. I have seen it in Beijing , Hong Kong , Manilla all have bad air pollution. Maybe the common wearing of face mask help. We do not know.

Philip Oakley
Philip Oakley
23 days ago
Reply to  earthflattener

UK, 67m people. Prediction of deaths 0.5m. Allow 60% infected for herd immunity => 1.2% covid death rate. Well within reasonable range for death rate.

All the rest is, essentially, code bloat. Now bugs are proportional to bloated code size. Doesn’t change the underlying issue that the politicians needed to address. They just needed to show a bigger code base 😉

Software testing at a unit level is not correlated with functional level issues. Never has been.

Mike Whittaker
Mike Whittaker
22 days ago
Reply to  Philip Oakley

Gives assurance that changes do not break basic functions, though.

The Phantom
20 days ago
Reply to  earthflattener

“Just scale up the deaths in NYC to the US as a whole…”

Is that reasonable? Are the physical conditions of high population density, enclosed subway cars, elevators and apartment buildings present in NYC continuous across the United States?

No. The conditions are not similar. Podunk Iowa does not have a subway system. Or a bus system, even. They have a Subway Sandwich shop down on Main Street. Currently closed, because of the penchant some people have for pretending you can extrapolate infection numbers from NYC to Podunk Iowa.

I saw a study the other day measuring COVID-19 antibodies. The results were instructive. ~21% of the sample from NYC were positive for antibodies, 14% positive from White Plains NY, a suburb north of the city proper, and 3% from rural towns north of that going up the Hudson Valley.

That’s how much you can’t extrapolate infection rates from NYC to anywhere else. The difference of 21% to 3%.

It also shows the futility of contract tracing in NYC, and of the futility in shutting down everything in rural Upstate New York.

Growltiger
Growltiger
25 days ago
Reply to  earthflattener

I am afraid that it is pretty clear that “the science” here has been self-smearing. Ferguson himself flagged the issue when he failed to simply post the “15,000 lines” of C code. Nothing difficult or even time-consuming about putting a file on Github. A team that could bother to correct “houshold” could have found time to post the file. None of the excuses stack up.

Whether the model continuing in use was the same as the one that generated Paper 9, and whether the results used in Paper 9 were reproducible would then never have been at issue, unless it deserved to be.

What he should have said is something like: “This is code that has been written and re-written on the fly for many years, and not documented because the team all understand it, but in the interests of transparency I am posting it. It is being brought up to date and properly documented with the help of Microsoft and other independent coding experts.”

That would have been honest, transparent and the end of the story. Unfortunately, the way that this has subsequently been handled is the way of men in a hole, who go on digging.

There are real difficulties for Ferguson out in the epidemiology, if we hack free from the weeds of whether he and his team can write C++ to the standard that would be expected of a team of actuaries. The more serious questions are whether the exercise was unacceptably question-begging and whether it got reasonable answers out of reasonable inputs. Complexity and lack of transparency have made this hard to answer, but probably the jury will say that a vast shopping trolley of 280-some inputs, most of them reasonable but some questionable (like the IFR of 0.9) produced an output that was wrong by an order of magnitude at least, where the influence of inputs could not be analysed properly, and where the policy response made things worse. At the very least, it should be recognised that the outcome, in terms of deaths, may well be no better than the 250,000 predicted from continued mitigation. But perhaps the next move is to be told that the lockdown was just mitigation with a few police drones attached, as it made no difference to the area under the curve?

thelastnameleft
thelastnameleft
25 days ago
Reply to  Growltiger

” The more serious questions are whether the exercise was unacceptably question-begging and whether it got reasonable answers out of reasonable inputs. ”
————

You write that, *here*? lol

thelastnameleft
thelastnameleft
25 days ago
Reply to  earthflattener

Thanks for your efforts. One can see from the negative responses to your posts that what is important here is the pursuit of a political agenda, not an academic inquiry into the merits or otherwise of a codebase, as you correctly state.

earthflattener
earthflattener
25 days ago

Thanks….appreciate yours too. There have been a few good and honest contributions. I’d single out dr_t, who said early on that he ‘probably doesn’t support the politics of Ferguson, or certainly not of his lover’, yet gave a brilliant and clear denunciation of the analysis made by the author of the critique.

malf
malf
22 days ago
Reply to  earthflattener

I have to disagree. Your responses, and dr_t’s, and this guys all reek of the arrogance of academia—where there’s no need to produce results or compete in a free market because there’s tenure and money coerced from people via taxation.

You substitute this condescending “oh, let me tell you how it really is,” and the only people who agree with you are other “old boys” who have the same outdated, feudal mentality.

The author is suggesting that we look at this as software, and you and dr_t are arrogantly saying “tut! tut! we cannot look at it as software, we must remember that academics run under different rules than hoi polloi…”

youreamoron
youreamoron
13 days ago
Reply to  malf

Malf, they’re saying it doesn’t matter. You obviously don’t understand it and the people who are pretending to are just making things up to make you angry at the government so you vote for the people they want you to vote for.

Brian Sides
Brian Sides
25 days ago
Reply to  earthflattener

I agree there is nothing wrong with straight forward ‘c’ code
I do not know what code analysis tools John Carnack used.
But they could not be very good.
You have variables entering sub routines as signed integers if 2 byte integers they could have any value from -32,768 to 32,767 if 4 byte integers they could have a range -2,147,483,648 to 2,147,483,647. These variables are not checked to see if they are within the expected range and are just used in calculations or loops
You can not trust that the variables entering a sub routine will be in the expected range.
If the variable enter the sub routines outside expected range the results could be very unpredictable. possibly causing a crash or lockup or a ridiculous calculation.
Any one who defends this code does not understand robust programming.

IrishDevilFish
IrishDevilFish
17 days ago
Reply to  earthflattener

Hi earthflattener, I was just wondering if you are weshinsley on github, as your above comment:

“Seeing as you have quoted John Carmack’s twitter, I hope that you find his comments encouraging when he writes…”

Is the same verbiage as the response here:

https://github.com/mrc-ide/covid-sim/issues/144#issuecomment-625151694

Ronnie101
Ronnie101
26 days ago

It’s always easier to take code that works and optimise it, than to take code that’s been optimised and make it work.

Back in the 1990’s there was a joke:
Q. Which is faster, Visual Basic or C++?
A. Visual Basic… by 6 months!

Mike Whittaker
Mike Whittaker
22 days ago
Reply to  Ronnie101

Which will have unit /regression tests ? C++ !

al45tair
al45tair
26 days ago

When you say that they’ve been “avoiding peer review”, I don’t think that’s really true — I don’t think scientists or non-software engineers ever really expect their code to be reviewed by professional developers (or even, frankly, skilled hobbyist developers). My guess is that your remarks have come as something of a shock to the epidemiologists who worked on this software. This new group of peers they’ve found reviewing their work is not the same as the set who they’re used to dealing with, and the peer review they’re used to happens primarily behind the scenes, prior to publication — as opposed to publicly, which is what’s happening here.

Tom Welsh
Tom Welsh
25 days ago
Reply to  al45tair

There are two different types of review that should have been done. As far as I can see, neither was even considered.

1. Scientific peer review of the model(s) used. This should have been done, ideally, by opening up the model(s) to anyone at all, and accepting all sensible and informed feedback. Failing that, at least a couple of dozen leading scientists should have been asked to do a thorough review.

2. Software review – an entirely different matter, requiring wholly different knowledge, skills and tools. It purpose is to ensure that the software correctly implements the final model(s) after review and correction; and also that the software itself is correct and bug-free.

ThatImperial is a university, whether or not it has a large staff of experienced software engineers, etc. are absolutely beside the point.

If it was not equipped for the task, it should not have undertaken the task.

GerryM
GerryM
22 days ago
Reply to  al45tair

I think you’re right to the extent that academics start out writing software to help them with their research and basically don’t apply any of the disciplines of software development to their programs. Which is OK until they take their software into the public policy debate. I don’t know the history of this software but my guess is it started as project for forecasting epidemics with no intent for its use in the public domain, but progressed to being a tool for forecasting by politicians without going through the stage of specification, design, documentation and verification by third parties – a red team if you like – just trundling along with bits added over a thirteen year period.

Mike Whittaker
Mike Whittaker
22 days ago
Reply to  GerryM

It would make sense to at least run e.g. Coverity over it and address the issues brought up.

Andy
Andy
19 days ago
Reply to  al45tair

I don’t think anyone really cares if the code is to exquisite standards of readability and style. The code review can take a horrible mess of code and still review it for correctness. Obviously it’ll be a harder and probably find more low-level bugs in such a codebase. Academics can be given a pass for writing code like that – they should do better, but they’re academics not proferssional software engineers after all.

but the correctness of the code really matters, all the overflows and inconsistencies and floating point rounding should be pointed out and resolved. The model should also be checked, by other scientists too.

Brian Sides
Brian Sides
26 days ago

First thanks for investigating this and bringing it to the attention of members of parliament.
Lets hope that this will reach the main stream. There should be an investigation on why there was not the required due diligence considering Neil Ferguson track record of bad predictions.
I am a retired software engineer , employed for some 35 years to write computer software.

When I first found out about Neil Ferguson Tweet I could hardly believe it.
Documenting your code is one of the first things you learn. Undocumented code would not meet the most basic test standards. All important code should be independently tested.

The test reports for the code should be released along with the version history.
The original code used should also be released.

The tweet from John Carmack says that the original code is a single 15k file.
But Neil Ferguson’s tweet says It was thousands of lines of undocumented ‘c’ code.
15k = 15*1024 = 15,360 bytes. If one allows an average of 20 bytes per line 15,360/20 = 768 lines.
You would have to allow an average of only 7 bytes per line to reach thousands 15,360/7 = 2,194 lines.
Even allowing for blank lines this is an improbably low average of bytes per line as each typed character takes one byte.
So I think some one must have there facts wrong either Neil Ferguson or John Carmack.

Neil ferguson’s tweet says that the he wrote the code 13 plus years ago. John Carmack tweet says the code had been worked on for decades. Unless the plus is decades both statements can not be true.
We do not know what processor or operating system the code was designed to run on.
This would be relevant to the random number generator any threading or interrupts.

Neil ferguson’s has said that he had to use certain assumptions based on the available data.
Obviously if different assumptions were made it would produce different results

The new code has this file Technical_description_of_Imperial_COVID_19_Model.pdf This is a short extract from this document
“From the above, every country has a specific mean infection-fatality ration ifr,,,,. In our model, we will allow the ifr for every country to have some additional noise around this. Specifically we assume that
ifra, .N(1, 0.1).
Using estimated epidemiological information from previous studies [4, 5], we assume the distribution of times from infection to death x (infection-to-death) to be the sum of two independent random times”
The example function has single letter variable .N . I detest single letter variables. A variable should be descriptive and you can not search on a single letter variable. I spent eight years writing mostly assembly code with lines like “ld,a 5” this means load the accumulator register with the decimal number 5. But the comment would tell you what the 5 represented or the purpose of the line of code. Many lines of assembly code are required to replicate one line of a high level language like ‘c’. So for a bit of typing a meaningful variable name is much preferred.
In this short extract the word assume appears twice. They say when you assume you make an ASS out of U and ME
If I was to write some computer code including a simple function to calculate how many chip shops there might be in my small town in twelve months time and the data I supplied was there was one chip shop last week and there are three chip shops this week. The program would produce a number of predicted chip shops that would surpass the land space on the planet.
Any attempt to create formulas to make predictions such as the likely deaths from a virus if certain action is not taken. Require simplifying the problem and not including many possible factors. Any such predictions when applied to the real world where there is no such simplification or removal of any factors. Would need to be treated with extreme caution.

Growltiger
Growltiger
26 days ago

I don’t get the point of the chart that purports to be an estimate of what the Imperial model would have said about Sweden. Since the Imperial model (in whatever state) is parameterised to a UK version of Sim City, it wouldn’t simulate Sweden terribly well, even if it worked for the UK. But, in any case, an estimate of what the model would have said is not an output of the model. Looking at the chart, there is not yet a meaningful conflict with what has happened in Sweden; for all we know, the red bit of “actual data” might grow into the giant bell curve to the right, as the red is only in the left tail of the histogram. (I don’t think it will, because there is a reliable official Swedish estimate of Rt currently being well below 1, but that doesn’t give this object any evidential status vis a vis Ferguson’s model). If the chart is some form of output from the Uppsala model, which predicted various rather different catastrophes (which would have happened by now, but haven’t) you should say so. This is disappointing.

Sue Denim
Sue Denim
25 days ago
Reply to  Growltiger

It’s the Uppsala study, yes, which used the ICL codebase we’re discussing here. They re-parameterised it for Sweden as discussed in their paper:

https://www.medrxiv.org/content/10.1101/2020.04.11.20062133v1.full.pdf

“We employed an individual agent-based model based on work by Ferguson et al”

“Individuals are randomly assigned an age based on Swedish demographic data” etc

Given that the Swedish outbreak appears to have been in decline for some time it seems unlikely the red graph could start to match the yellow graph. Yes, in theory there could be a sudden massive second wave in Sweden, even after two months of pursuing something between the “moderate” and “do nothing” strategy. If that happens then I will look at these models with newfound respect, especially as they’d have correctly predicted something the data wouldn’t obviously imply and in the presence of severe bugs that corrupt the outputs. I don’t think it’s going to happen, though.

For those who want more details there’s a writeup of the Sweden/ICL case study here:

https://www.aier.org/article/imperial-college-model-applied-to-sweden-yields-preposterous-results/

forsyth
forsyth
25 days ago
Reply to  Sue Denim

The code Uppsala used is here https://github.com/kassonlab/covid19-epi . I think it’s a re-implementation of ideas from the Ferguson et al papers, unrelated to any IC source code. Initialisation and some details are specific to Sweden. It’s roughly 4100 lines of C. There’s another model in Python.

forsyth
forsyth
25 days ago
Reply to  forsyth

What’s interesting about it is that a separate implementation of a simulator, but relying on the same theoretical model, also produces similarly poor predictions for Sweden. “… under conservative epidemiological parameter estimates, the current Swedish public-health strategy will result in a peak intensive-care load in May that exceeds pre-pandemic capacity by over 40-fold, with a median mortality of 96,000 (95% CI 52,000 to 183,000)”. Now, there’s a factor of 3.5 in the CI, but although we’re not quite half-way through May, Sweden’s current graph looks to be heading down, but with 3,225 deaths, it would take every currently confirmed case (26,322) to die to get to just 55% of the 52,000 lower bound on that 95% CI, let alone the 96,000 median. It doesn’t seem too likely. Assuming this code too isn’t broken, an obvious possibility is that the actual mathematics is fine, but the initial parameters are wrong. (That wouldn’t be surprising for a novel virus: by the time you’ve seen enough data to calibrate the parameters correctly, it’s too late to make useful predictions. Next time!) Alternatively, perhaps there is something critical about the way viruses spread that isn’t reflected or allowed for in the maths. It wouldn’t surprise me if it’s a mix of both. Needs a little experiment to explore it.

Brian Sides
Brian Sides
26 days ago

Re reading the John Carmack tweet . I guess he did not mean the size of the file was 15k. But that there were 15 thousand lines of code in the file. That is a lot of lines of code in a single file particularly if it was undocumented.
Breaking your code into smaller files with meaningful names is another thing you soon learn to do.
The idea that this file was worked on for decades with out this basic step is hard to believe.
There should be a document describing each of the subroutines what there expected inputs , outputs and limits are.
The subroutines should check inputs , calculations and outputs stay in range , with error reporting or logging if they do not.

What I said earlier still stands true. Neil ferguson’s tweet says that the he wrote the code 13 plus years ago. John Carmack tweet says the code had been worked on for decades. Unless the plus is decades both statements can not be true.

Another Anon Talking Head
Another Anon Talking Head
26 days ago
Reply to  Brian Sides

“13” is decades ago in just the same way that any number bigger than 1000 is “thousands. It’s fairly common but imprecise English – which you might expect in a tweet.

Brian Sides
Brian Sides
25 days ago

I must be using different English to me 13 is not decades. 20 or more is decades 2000 or more is thousands. I have seen comments talking about 30 year old code. I do not know where that information comes from.

Tom Welsh
Tom Welsh
25 days ago

“Fairly common but imprecise” is not what I would look for in a software program used to decide the fate of a nation.

ActuarialModeller
ActuarialModeller
26 days ago

Sue – May I suggest that you compare what ICL has done against the professional standards that would have applied in the insurance industry to an actuary performing the same exercise? These standards can be found here:

https://www.frc.org.uk/getattachment/b8d05ac7-2953-4248-90ae-685f9bcd95bd/TAS-100-Principles-for-Technical-Actuarial-Work-Dec-2016.pdf

https://www.frc.org.uk/getattachment/c866b1f4-688d-4d0a-9527-64cb8b1e8624/TAS-200-Insurance-Dec-2016.pdf

For example:
“Implementations and realisations of models shall be reproducible”

“Models used in technical actuarial work shall be fit for the purpose for which they are used and be subject to sufficient controls and testing so that users can rely on the resulting actuarial information”

Sue Denim
Sue Denim
25 days ago

Thanks for the links. As I’ve never worked in insurance I’d rather not do that myself – but why not contribute such an article yourself? It would be quite interesting to read.

malf
malf
22 days ago
Reply to  Sue Denim

The scariest part of this is that what those two guidelines say is just common sense—if I am writing a function to model, say, how many eggs N chickens will have in a period of T days, we can generalize that as f(N,T) = x. For every pair (N,T), x should be the same, for every run. It doesn’t matter if, internally, f involves a half dozen, a thousand or ten billion operations, it is at minimum a requirement that for f to be useful, f(N,T) should be constant, mathematically that is the definition of a function. f(x) = y, not f(x) = y sometimes, z other times. Even if you’re using pseudorandom numbers this is true (only if you use some empirical source of entropy would this be false), and the reasonable way to do that is to have a seed as a parameter, f(N,T,seed) = x, so that can you can vary the seed, but for a constant (N,T,seed), the output is always x.

The defense of this seems to me to reek of academic elitism, which is generally done In Real Life in pompous tones, e.g. if a Prof is talking down to a bright undergrad—and this happens, I have a carpenter I use. He tried to do a degree in science, he scored just fine, but he was constantly being called into the Dean’s office for “arguing with faculty.” He called them out on things he thought were wrong. He eventually couldn’t take it, dropped out and became a carpenter. His son eventually got tested for some “gifted” program, had one of those off-the-charts IQs, was done his undergrade by his early teens, etc. etc. That’s got a large genetic component.

The big myth at play here is that Universities select for the best and the brightest. They do not. They select for people who are certainly not stupid, but who do not argue with their advisors, except in the collegial way that assures them tenure. The easiest way to get tenure is to just do well on your undergrad, do what your advisors say in grad school and be “kind.” The hollywood conception of academics as argumentative individualist mavericks is 100% mythology.

Myreal Name
Myreal Name
25 days ago

What’s wrong with your real name? Doesn’t exactly aid digestion if the author wants to hide behind a pseudonym does it?

FactsNotFaces
FactsNotFaces
25 days ago
Reply to  Myreal Name

If they are basing their position on asking us to trust them then yes, anonymity detracts. But they are basing their position on showing us the code, their analysis and supporting links. If they want to protect themselves from political and personal attacks, that seems pretty sensible to me.

Christopher Scarpino
Christopher Scarpino
25 days ago

I have said from the very beginning that the group with the real data, the certified people and the resources were in the insurance companies. The field is called Actuarial Mathematics, and it requires participants to take a series of very difficult exams. This certification process – together with being in possession of the real data on births, deaths, accidents etc. – is the reason why insurance companies can predict the accident rate and/or death rates and incidences of morbidity from disease. If they get it wrong, they pay. So they have an incentive to get it right. A major center for insurance in the US is located in Hartford, Connecticut. It is about 2 hours from Boston. So all the policy makers here needed to do was just call them up….

What is surprising is that many other academic fields failed to question the crisis. In the midst of what was described as a worldwide pandemic, where were the academics in Data Science or Biometrics?

Brian Sides
Brian Sides
25 days ago

One of my favourite films is Double Indemnity a 1944 American film noir
The scene where Edward G. Robinson an insurance assessor explains to his boss all the accident types and breaks down the classification of each type with out catching his breath and finished by downing a glass of water.

I have written safety critical code for fire alarms used in hospitals and hotels.
But it was only when I was writing code for the gambling industry that they code was independently tested. With millions of pounds being bet it had to be correct.
Once again money more important than life

Christopher Scarpino
Christopher Scarpino
25 days ago

At my college, I am going to request that we change the name of the department of “Data Science” to the department of “Data Silence.” As this would more accurately reflect the guidance they provided to society in a time of crisis.

Mike Whittaker
Mike Whittaker
22 days ago

On what evidence ?

Brian Sides
Brian Sides
25 days ago

I have just started to look at the code bug fix’s you highlighted.
What absolutely terrible code
https://github.com/mrc-ide/covid-sim/commit/facc5127b35b71ab9b6208961e08138f56448643
“Fix a bug where variables weren’t getting initialised to 0

x and y were starting off with whatever value the previous code had for
them. Seeing as they are only used locally, I’ve made local varibles for
them.”

Are you kidding they had global variables named x and y. Well good luck searching for all occurrences of those global variables in your 15000 lines of code. Hope you do not mix them up with other local variables named x and y.

The short code snippet had variables i and nested i also t , x , y , s. I wonder what they do when they need more than 26 variables.

https://github.com/mrc-ide/covid-sim/commit/3d4e9a4ee633764ce927aecfbbcaa7091f3c1b98
Fix issue causing invalid output from age distribution code.

!!!!!!!!!!!!!!!!!! Yikes Yes that is kind of important !!!!!!!!!!!!!!!

https://github.com/mrc-ide/covid-sim/commit/581ca0d8a12cddbd106a580beb9f5e56dbf3e94f
Use calloc() to allocate memory.

This fixes an uninitialised memory usage when using schools as input.

Yes always good to allocated memory properly.

https://github.com/mrc-ide/covid-sim/commit/62a13b68538c8b9bb880d15fdcf5f193fb27885a
changed another instance of houshold bug plus other changes

This is bar far the least important. I would rather have a variable with a spelling or typing or abbreviated word than a non descriptive or much worse single letter variable.

To think that predictions from this terrible code had anything to do with the decision to Lock Down the UK
Is enough to make you cry. It is criminal it makes me mad.

zebedee
zebedee
19 days ago
Reply to  Brian Sides

I’ve managed to refactor out lots of x’s and y’s. It’s not hard and 15 kloc is fairly trivial to deal with.

Labomba
Labomba
25 days ago

While you are at it you should teach basic statistics to every second md running antibody-tests and commiting major statistical errors, or point out the modelling errors (and lack of tests) on the other side,such as Swedish CDCs models.

But I guess no interest in that as it does not support the thesis of the blogs title.

If anything this crisis has pointed out that science is dead, just as truth is and it will be misused intentionally or unintentionally to support whatever thesis one believes in.
“Scientist say you can’t trust science”.

We do not have polymath groups for crisis like this. It is easy to point out the programming errors of scientists and modellers, but most programmers (and for that matter data scientists) ability to understand even the simple statistical pitfalls is pretty low.

Sue Denim
Sue Denim
25 days ago
Reply to  Labomba

My scepticism about the output of academic modelling is not political in nature. Although sero-surveillance is a different field to epidemiology, with different procedures and standards (i.e. it’s attempting to measure the real world rather than simulate it), there are good reasons to ask questions about that too. To prove that my interest in solid science is universal here’s a writeup in a similar vein to mine about the statistics behind a Californian sero-survey:

https://statmodeling.stat.columbia.edu/2020/04/19/fatal-flaws-in-stanford-study-of-coronavirus-prevalence/

NY
NY
25 days ago
Reply to  Sue Denim

The Santa Clara county study has been updated recently to address some of the issues brought up by peer review. John Ioannidis states in an interview that their adjustments did not change the conclusion in a substantial way. You can read the updated paper at Medrxiv. I would also add that the general result – a large underascertainment of cases – has now been replicated in numerous serology studies from around the world.

Serology studies are by far our best tool to determine the true risk posed by Covid-19. Now I’m clearly not talking about you, but it is strange to me that the same people who are so quick to take epidemiological software models at face value are so quick to scrutinize and dismantle serology studies which are based on hard, real-world data.

Sue Denim
Sue Denim
24 days ago
Reply to  NY

It’s good to know the paper was updated in response to critics finding problems, perhaps I’ll go catch up on the latest news around it.

I agree sero-surveys should be useful. In recent weeks some people (“sceptics”) attempting to figure out what’s going wrong with epi models have been suggesting there might be some non-antibody based ability to fight off the infection, as the medical assumptions in these sorts of models are really quite basic – of a level any high school student could understand. Given how divergent the results are from observed reality, and given that other models are also wrong by large amounts, an obvious explanation is that there’s some underlying problem with germ theory.

David C
David C
17 days ago
Reply to  Sue Denim

Is it the germ or is it the host/field/terrain, Pasteur or Béchamp?
https://thevaccinereaction.org/2018/02/pasteur-vs-bechamp-the-germ-theory-debate/

“Western medicine fiercely protects the germ theory of disease, scorning and dismissing Béchamp’s ideas out of hand. There is no doubt that much of what Béchamp was able to determine has been supplanted now by scientific resources unavailable to him at his time, but that can also be said of Pasteur’s theories. Such a narrow view of disease misses the gist of Béchamp’s teachings: the importance of supporting a strong internal defense system to ward off disease and attain true health rather than relying on drugs and vaccines as a sledgehammer to treat symptoms and attempt to destroy germs.

Many disease-causing microbes are normally present in the body and do not cause disease as a matter of course but are kept at bay in people who have healthy immune systems. Other infectious microbes can spread from person to person via water, air, insect bites or exposure to infected body fluids and have the potential to cause serious complications in an immune compromised host.”

David C
David C
17 days ago
Reply to  Sue Denim

“an obvious explanation is that there’s some underlying problem with germ theory.”

Yes, Pasteur’s germ theory and the epidemiology predictions based on germ theory assume that everyone exposed to an infected person is eqaully likely to themselves become infected.

Germ theory ignores the obvious fact championed by Antoine Béchamp, everything equal, that it’s overwhelmingly the sick and the weak at greatest risk from contagious disease. Healthy people tend to stay healthy while sick people tend to become ever more sick. But modern physicians are trained to not see the unhealthy lifestyles of the sick because that’s “blaming the victim.”

Gov. Andrew Cuomo’s order to NY nursing homes https://nypost.com/2020/05/10/cuomo-was-wrong-to-order-nursing-homes-to-accept-coronavirus-patients/ that they must accept as residents people “infected” with the disease and not giving even the slightest pretense of ensuring they’d be placed safe environments seems like it was designed precisely to create clusters of sick people in order to get the epidemic started. And once they got it started the lockdowns kept it going.

Mike Whittaker
Mike Whittaker
24 days ago
Reply to  Labomba

Programmers, data scientists working in the statistical domain should at least know the “textbook” issues .

David
David
25 days ago

As an academic, what concerns me is that there has apparently been no (expedited) external, peer review of Fergurson’s reports with their model results. Relying on a single model is also foolhardy.
Climate science uses multiple models with different assumptions to produce ensemble forecast but nothing like that seems to have been attempted for this public health crisis.

Growltiger
Growltiger
25 days ago

Earthflattener says that this discussion of the software is an attempt to smear the science. I am afraid that it is pretty clear that “the science” here has been self-smearing. Ferguson himself flagged the issue when he failed to simply post the “15,000 lines” of C code. Nothing difficult or even time-consuming about putting a file on Github. A team that could bother to correct “houshold” could have found time to post the file. None of the excuses stack up.

Whether the model continuing in use was the same as the one that generated Paper 9, and whether the results used in Paper 9 were reproducible would then never have been at issue, unless it deserved to be.

What he should have said is something like: “This is code that has been written and re-written on the fly for many years, and not documented because the team all understand it, but in the interests of transparency I am posting it. It is being brought up to date and properly documented with the help of Microsoft and other independent coding experts.”

That would have been honest, transparent and the end of the story. Unfortunately, the way that this has subsequently been handled is the way of men in a hole, who go on digging.

There are real difficulties for Ferguson out in the epidemiology, if we hack free from the weeds of whether he and his team can write C++ to the standard that would be expected of a team of actuaries. The more serious questions are whether the exercise was unacceptably question-begging and whether it got reasonable answers out of reasonable inputs. Complexity and lack of transparency have made this hard to answer, but probably the jury will say that a vast shopping trolley of 280-some inputs, most of them reasonable but some questionable (like the IFR of 0.9) produced an output that was wrong by an order of magnitude at least, where the influence of inputs could not be analysed properly, and where the policy response made things worse. At the very least, it should be recognised that the outcome, in terms of deaths, may well be no better than the 250,000 predicted from continued mitigation. But perhaps the next move is to be told that the lockdown was just mitigation with a few police drones attached, as it made no difference to the area under the curve?

g00se
g00se
25 days ago
Reply to  Growltiger

Absolutely. Moreover, i think this debacle shows we need legislation:
a. All Government code to be GPL
b. An international standard created such that all execution paths, including those generated involving stochastic techniques be required to log their values, such that their consequences, and the reproducibility thereof, be easily determined
This of course is increasingly urgent as our lives become increasingly dominated by algorithms.

Mike Whittaker
Mike Whittaker
24 days ago
Reply to  g00se

Do you know what GPL entails ?

Mike Whittaker
Mike Whittaker
23 days ago
Reply to  g00se

Including the Cambridge Analytica codebase that produced the Brexit result ….

stephenfisher
stephenfisher
25 days ago

Upon reading both articles and all the comments I think the real challenge is to get Professor Ferguson to write down his model rather than release code.

Most contributors here are perfectly able to code up a simple set of difference equations once they are presented concisely.

g00se
g00se
25 days ago
Reply to  stephenfisher

The problem there is that the code IS the model. Of course, it goes without saying that professional quality code would have any equations used therein properly isolated and commented.
RELEASE THE ORIGINAL CODE

stephenfisher
stephenfisher
25 days ago
Reply to  g00se

That is my point.

It seems that the research group has just made changes to code which may or may not be an accurate reflection of the interactions of virus with host and that stale/incorrect code remains intact despite having dubious reasons to belong. Ad hoc characterizations of the effects of masks/distance/other actions hang around in the code with the new variables conflicting with old definitions. Etc etc etc. The whole thing is a mess because basic research discipline has not been followed- document theory, document empirical facts document model, then, and only then, start coding.

The Imperial College group don’t even know what is in their model!

Growltiger
Growltiger
25 days ago
Reply to  stephenfisher

There is an obvious contrast with the Gupta model, which was presented as a set of equations, with not too many independent variables. I am afraid that the difference equations are buried within the Imperial model because they might be thought too simple to be allowed out on their own.

stephenfisher
stephenfisher
25 days ago
Reply to  Growltiger

If the Imperial model turns out to be simple then that is a great outcome. My suspicion is that it turns out to be wrong

thelastnameleft
thelastnameleft
25 days ago

This ‘critique’ is even worse than the original effort it seeks to supplement.

—–
“A few people have claimed I don’t understand models, as if Google has no experience with them.”
—–

Now the author *is* Google, not merely a former employee (of unknown position).

How can anyone take this seriously?

—–
I was glad to see the analysis was read by members of Parliament.
—–

I bet you were, Sue. That was the entire point, of course: to bypass those whose expertise is relevant and get straight to those whose uninformed opinion matters more and whose ignorance is more easily swayed by your inadequate ‘critique’. The entire purpose of this endeavour is obviously to manipulate public opinion in favour of undermining the rationale for lockdown.

Ever been had?

Anothermodellerjoinsthefray
Anothermodellerjoinsthefray
25 days ago

Joining this thread very late, as a professional modeller of 20 years standing and now immersed in COVID-19 descriptive/predictive modelling…

Sue Denim’s original post raises a key issue of relevance to any computer model…how do you know if it is working as intended?

A model is only ever a representation of reality. The requirement in the public sector policy arena is generally not that it should be ‘right’, but that it should be ‘right enough’ to make defendable policy decisions. In the case of the ICL model, I would want to know the basis on which the modellers were claiming that their model met the standard needed to direct Government policy in the way that appears to have happened. At the heart of this, there should be three tests:

1) Does the model use the right methodology (is the epidemiology model appropriate)?
2) Does the computer model do what the methodology dictates should be modelled (i.e., has the epidemiology model been coded correctly into the computer model)?
3) Has the model been populated with the right inputs and calibrated (i.e., has it used real world data and does the output accord with the reality of what we actually see?).

With these tests in mind, I think Sue Denim alludes to two central issues: first, universally, all models must be capable of being tested in a way that is replicable in order to show that the logic is working as intended with the given inputs (does anyone seriously disagree with this?); secondly, that it is really, really bad practice to recode/repurpose models (especially ones built like ICL’s) where you are unclear about the purpose of the original logic and especially so when you cannot see whether or not your new logic is interacting correctly with the original logic (or, specifically, not interacting, if that is the intention) (again, does anyone really disagree with this?). If you ignore these two things, how can you ever know whether or not your original / updated / patched code is working as intended?

In this case, we appear to have neither a proven epidemiology model (specifically for COVID 19), nor a computer model that can be shown to have been coded with the preferred manifestation of the epidemiology model (and seems to be non-deterministic as well), nor valid data with which to populate the model (instead relying on the data from other countries). This suggests to me the need for some pretty robust and objective challenge (and this is what I do every day with my own modelling team and my clients’ models). Of course, the ICL model may still be found to be fit for purpose (dire code and all), but the risks of this not being the case would seem to be material. So, if this model is to be used for decision-making, the risks should at least be understood and accepted. Just because the ICL team is comprised of publicly-funded (underfunded?) academics doesn’t alter this.

On the specifics of the ICL model, much of the debate seems to be about the use of stochastic (and recursive) modelling and whether or not such modelling can or should be subject to testing. I didn’t pick up from the original post any conceptual disagreement with using (or misunderstanding of the use of) stochastic modelling per se, but some of the posts instead seem to be saying that Sue Denim is wrong to say that the ICL model should be deterministically testable.

I find this a very strange argument. It must always be possible to frame a set of tests that demonstrate that a model is working as intended to a given standard (i.e., where the recipient can set the test threshold wherever they like), regardless of the code methodology. The fact that a stochastic model is running more iterations than can be individually tested doesn’t alter this – the logic of an individual iteration is deterministic (even in a predictive model using chaos theory logic) and that logic must be testable with test inputs at some level. So, the question then really is: how much unit testing do you do to satisfy yourself that the system is working as a whole if you cannot construct a replicable deterministic test on the whole system? That is what I think Sue Denim is getting at. Put simply, in simulation world, you must test enough sample iterations to be comfortable that the logic can be abstracted to run generally in a simulation (and where you practically cannot test every iteration). Once you start simulating using random numbers, the integrity of the test simply requires that you keep the seeds to be able to test a reference set of specific iterations, which MUST generate precisely replicable answers (does anyone really disagree with this?). I didn’t think Sue Denim was claiming that each simulation must replicate precisely – this is clearly impossible – rather that you must be able to construct a set of tests at the level of the iterations to infer to the required level of confidence that the overall simulation is working as intended. In other words, it is absurd to suggest there is an intrinsic feature of a simulation model that renders it immune to testing or generically permits non-determinism.

I think this is where I am struggling with the posts that seem to imply that not understanding the code is perfectly okay, because (I paraphrase) ‘it will all come out in the wash through averaging the simulation iterations’. This is nonsense and, frankly, suggests a pretty profound misunderstanding of both the statistical basis of simulation and modelling good practice. Averaging of this nature is fine only when you can show that your code does not unwittingly contain systemic bias (wrong logic or wrong inputs) that causes every iteration to be over or understated (which obviously does not average out). In the context of the ICL model, this means always (and erroneously) generating forecasts in the range of, say, 150,000 to 300,000, when they should, correctly, be in the range of, say, 20,000 to 100,000. You must be able to frame tests that demonstrate that your outputs are consistent with the intended logic. If you really don’t know how your code interacts with the inputs and can’t test it, how can you possibly show that your average forecast is not biased? Or justify your confidence intervals?

This problem is compounded when you pick up and try to re-purpose someone else’s model and where you don’t understand the original model. The context of the new problem can easily render the original model completely unsuitable for being repurposed. It sounds to me as if the ICL model has been held out to be a generally abstracted solution for pandemic modelling when no-one really knows whether or not this is valid. Moreover, as noted by others, the model has not been built to include the specific policy interventions applied by the UK Government, so it couldn’t be validated ex-ante and can’t now seemingly be validated ex-post.

The key learnings for me from this are i) that Government might have done better simply to recognise the inherent limitations of models and asked the right questions before taking the models on trust and ii) that policy makers must be massively more diligent in their pandemic planning about how they frame testable pandemic hypotheses – this including setting themselves up to be able to gather relevant data from the outset (and this might require GDPR to be bent a little!), allowing predictive hypotheses to be tested much more quickly and effectively, so as to generate more timely and useful policy insights.

The above aside, being pragmatic about all of this and in defence of poorly documented code, who hasn’t done this from time to time? I don’t know any modeller or programmer who specifically cleanses each version of their model to ensure that it contains only the code for the current formulation of the problem they are solving. Coding is an inherently iterative process and most coders will retain a certain amount of redundant and partially developed code because it might be useful again later. The key things, of course, are to document both the overall code schema (the logic blocks and how they are intended to interact) and the detailed code itself (the live code and the ‘parked’ code)…and submit your work for independent review by people who are qualified to test it.

My sniff test for all of this, born of experience, is simple: if you put down your model for a month and then go back to it, if you do not pretty well immediately understand what you have written (with recourse to the documentation, obviously), then you have written poor code and/or failed to document it properly. It really is that simple. And when you do write really efficient, clever code, that only emphasises the need to document what the code is doing and why it works, especially if you are sharing that code with others who might have to adapt it. Put another way, my clients wouldn’t accept models that were effectively disclaimed along the lines of “Sorry about the unintelligible code…didn’t have the time or knowledge of good practice to write it properly…I think it works but I don’t know really…can you pay me anyway?”

Reflecting on all of this, should academics be stripped of modelling? I think Sue Denim has the right insight (leave modelling to modellers) but the wrong conclusion. Poor modelling happens absolutely everywhere – government, industry and academia, largely through a failure to understand that doing it well is a skilled discipline. The private sector simply has the incentive to find problems more quickly, because they are burning their own money if they don’t, but this does not mean that they don’t have just as much poor modelling as anywhere else. Modelling should be done by professional modellers who i) practice enough to be competent at it, ii) have an aptitude for it and iii) are in a peer environment where good practice coding and testing are the norm. I am delighted if people want to become modellers…but please don’t pretend that high quality modelling is something that anyone with half a brain can do simply by buying a software licence. So, in the context of academia, I would still give money to grant recipients for modelling, if they can demonstrate an environment that sustains practice competency.

Finally, the above aside, I do wonder if we are overstating the impact of the ICL model. Regardless of what might have happened to national infection rates without lockdown, even with lockdown I am led to believe from talking to friends on the front line in the NHS that COVID 19 brought our hospitals close to being overwhelmed at infection rate numbers an order of magnitude less than the ICL model predictions. And it has been a terrifying experience for many NHS workers. Lockdown was going to happen anyway, with or without the ICL model.

g00se
g00se
25 days ago

>> In other words, it is absurd to suggest there is an intrinsic feature of a simulation model that renders it immune to testing or generically permits non-determinism.<<
Indeed it is absurd. But that is what some people here seem to have been stating. I can only conclude that this is either serious ignorance of software design or more attempts to throw sand in the eyes of people involved in this serious discussion.
RELEASE THE ORIGINAL CODE

Growltiger
Growltiger
25 days ago

Everything said about the disciplines applicable to modelling (and to advising Government on the trillion dollar question of the day) is highly convincing. As is the observation on the difficulties of the front line NHS with patient numbers an order of magnitude lower than predicted (for mitigation, at any rate, although at one stage Ferguson was heard telling a Select Committee that the numbers might be lower still, given lockdown). It makes one wonder if the best justification for going slowly at this point is the state of the NHS troops, who really need the relief of near-zero cases before things can normalise. And more so if there is to be a second wave.

Mike Whittaker
Mike Whittaker
20 days ago

Many commenters here seem to have a “freedom means freedom” agenda where a lockdown is an affront, nay outrage.

Cristi.Neagu
Cristi.Neagu
25 days ago

The willingness of the government to accept the output of this model means one of two things:
1. The model said what the government wanted it to say, and would therefore enable their plans. This means the government is not working for the people, but against them, and is therefore not fit for ruling.
2. The people in government do not know or care how the data they use is created or where it comes from, and even worse, they do not have advisors capable of making that judgement. This means the government is criminally incompetent and is therefore not fit for ruling.

As a side note, i wonder what this spectacular display of ineptitude from a respected institution means for climate models. Maybe an investigation should be carried into those as well, considering they are being used to make policy decisions.

GerryM
GerryM
23 days ago
Reply to  Cristi.Neagu

OK, what would you have done as PM if a published paper from a major university forecast 500,000 deaths if you didn’t introduce a lockdown? Ignore it with the entire press and broadcasting media already baying for your blood? The government didn’t have a choice, the blame lies with the modellers and the hysterical anti-government press coverage. This was the time to pul together and we didn’t do it.

Andy
Andy
19 days ago
Reply to  Cristi.Neagu

FYI Michael “hockey stick” Mann was in a court case for libel against Prof Tim Ball until relatively recently. He lost, mainly because when asked by the court to show his data, refused. So now millions of dollars of damages await, because he’d rather pay than show his workings.

dr_t
dr_t
25 days ago

It is probably very comfortable publicly criticizing someone else on the grounds of a lack of transparency (like calling on Ferguson to publish his original code), while hiding behind an anonymous and non-transparent Sue Denim. But do you think it might be a tad hypocritical?

The one party which has been most eagerly pushing the idea that this virus is harmless, nothing to worry about and that we should just let it wash all over us, right from the start, is the Communist Party of China.

As I’ve said before, I don’t think assessing the quality of software engineering is the most important aspect of assessing Ferguson’s work. But discussing the quality of his programming is one thing, while trying to discredit his model in order to try to re-impose the discredited ‘herd immunity’ approach, which is precisely what some are now doing, is another. The damage this would do to us would benefit only one party – China and its communist shills embedded in the West.

Without knowing who you are, how can anyone independently assess who you are linked to and what your hidden agendas might be, if any?

g00se
g00se
25 days ago
Reply to  dr_t

>>Without knowing who you are, how can anyone independently assess who you are linked to and what your hidden agendas might be, if any?<<
Why not just assess Sue Denim's arguments on their merits alone?

To equate the deliberate hiding of Ferguson's original code, information of great national interest, paid for BY the nation, with one (wo)man's choice to hide his/her identity is preposterously disingenuous. Do you really think people are going to fall for that sleight?

dr_t
dr_t
25 days ago
Reply to  g00se

I have already addressed Sue Denim’s arguments on their merits, or lack thereof, as the case may be.

As “Denim” has said, MPs have been reading “her” arguments. This is not a private forum. Attempts are being made to use this tear down of Ferguson to influence policy and re-introduce the ‘herd immunity’ policy.

I think it is in the public interest to know who is influencing MPs, or trying to, what their agenda is, and who they are connected to.

This is a legitimate question, not a sleight.

Mike Whittaker
Mike Whittaker
24 days ago
Reply to  dr_t

Cummings and the “Leave” backers?

Mike Whittaker
Mike Whittaker
22 days ago
Reply to  dr_t

Also, given the targeting of Ferguson by the right-wing press such as the Telegraph …

malf
malf
22 days ago
Reply to  dr_t

No, you are wrong.

It is only legitimate if we’re going to be told that Sue Denim is an expert—being “an anonymous professor of biology at Harvard” (or some University) is one thing, because the whole point to feudal titles like “university professor” is to cause the holders to be treated deferentially, like so many Lords and Ladies.

Sue Denim has more or less just said “I’ve been paid to be a computer programmer at Google,” but that’s hardly a big claim to fame, and the argument would be the same whether or not the individual had been employed at all.

The reason academics are obsessed with titles and “lack of anonymity” is because their whole schtick is that their titles and degrees make them “more important” or at least “more authoritative.” Thankfully, computer programming is one, perhaps the biggest, domain in the world where titles don’t matter, results do.

There’s no agenda here other than saying “if I were a paid programmer, this code would be considered embarrassing.” I mean, as others have pointed out, we have functions where the variables are named things like a, t, b, x, y, i. That’s unacceptable, it’s simply the _wrong way to program_, and it is the wrong way for anyone to program, professional, hobbyist or amateur. It’s little errors like this that make it difficult to take someone seriously, no matter what their titles.

GerryM
GerryM
23 days ago
Reply to  dr_t

There are only two cured for a pandemic, a vaccine and herd immunity. The lockdown was to prevent the NHS being overwhelmed, the area under the curve remains the same, in other words, in the absence of a virus, we will still get the number of deaths it will take to kick in herd immunity regardless of what steps we take.

GerryM
GerryM
23 days ago
Reply to  GerryM

Oops, “in the absence of a vaccine”.

Denim Sue
Denim Sue
25 days ago

Your initial analysis comes across as extremely demonising. The choice of language veers more towards sensationalism rather than factualism. That being said, your second round of analysis hasn’t addressed the main argument. This is a stochastic model, where the ensemble average is reported. Can you do a set of 1000 runs and then report the results? What is the shift in the average? My anonymous idea is that the average will show a very negligible shift.

These are the actual questions that need to be answered before people start berating and demonising the extremely underpaid and overworked scientists at ICL.

Furthermore, your background of working at google proves that you were a software developer. It doesn’t prove that you have an actual understanding of how mathematical models work.

Random Forest
Random Forest
25 days ago
Reply to  Denim Sue

If I write a stochastic ensemble model that purports to emulate the behavior of a fair coin, but I screw up my code so thoroughly every run produces 1/3 heads and 2/3 tails, there is no amount of averaging that will give me 0.5 for the coin’s fraction of heads.

Averaging does nothing for bias or other SYSTEMIC forms of error. If the model under discussion is systemically flawed, averaging just makes us feel better about ourselves–it doesn’t give a better model estimate of anything.

Sorbitol
Sorbitol
25 days ago
Reply to  Random Forest

I accept your premise. Yet, even in your premise to demonstrate a flawed model, it must be demonstrated that the result is flawed across a large number of replicates. In this case, the author of this post is not demonstrating a flaw in the model. They are demonstrating a flaw in coding. These are two very different things. Based on the bugs present in the library they are presenting the premise that somehow >50% of N (where N is a very large number) individual distributions are all affected by the myriad of bugs that have been found. That has simply not been demonstrated.

As to the model being flawed. If it was flawed, then the model would not have achieved consensus with the myriad of other models currently out there.

So without any demonstrated flaw in the model it is just a code review. And until the first is fulfilled this is just a witch hunt against a group of under-paid and overworked people who do their jobs because it’s their passion.

Brian Sides
Brian Sides
24 days ago
Reply to  Sorbitol

murphy’s law anything that can go wrong will go wrong

When you look at code. You look at what can go wrong.
For example when the data is read in there is much that can go wrong
You may only read part of the file. Some of the data may get corrupted.
So you error check the data as you read it. You can have data check sums or other data validation methods that you can check the data.

As for many successful uses of the program. We do not have audited test data to determine this.

malf
malf
22 days ago
Reply to  Sorbitol

Under-paid? Most tenured professors make in excess of six figures, in Canadian/US dollars…

malf
malf
22 days ago
Reply to  Denim Sue

Why are you being condescending? How mathematical models work is very simple. The mathematics might be complex, but the idea, don’t be condescending. So, I am going to write a function to determine how many eggs N chickens will lay in T days. So, I am going to write a function, f, such that

f(N,T) = x

Now, for f to be useful, for every pair (N,T), f(N,T) must produce the same output, that is, f(N,T) = x every time. If we’re going to introduce a pseudorandom number into the mix, then the appropriate way to do that is to pass it to f:

f(N,T,prn) = x

So, for every list (N,T,prn), f(N,T,prn) = x. Even if f uses prn to seed a pseudorandom number generator that generates a billion further pseudorandom numbers (PRNs), that set of PRNs will be constant for a given seed. This is the basic way you would go about writing modelling function, whether they are of one variable, two variables, 200 variables, or 200 variables plus a random factor.

This is a high level view, but it’s how it would work in any case where you were writing a function that was to be testable and useful. But the ultimate test of any model is, if I have 10 chickens, and every 14 days I had 8 eggs, and my model said I should have 5 eggs, or 20 eggs, obviously my model is wrong. So it’s not just a question of mathematical modelling, it’s a question of “does the model jive with reality?” What we’re seeing with this model is that it doesn’t really map to the reality, and now what we’re investigating, with the help of the source code, is _why_.

Tom Welsh
Tom Welsh
25 days ago

‘The released code at the end of this process was not merely reorganised but contained fixes for severe bugs that would corrupt the internal state of the calculations. That is very different from “essentially the same functionally”’.

Unless of course the current, new and improved version is also full of severe bugs that would corrupt the internal state of the calculations.

In which case it’s just “plus ca change, plus c’est la meme chose”.

Seriously, people who write supposedly safety-critical programs that contain data corrupion bugs and other severe bugs that would corrupt the internal state of the calculations cannot be expected to fix every thing and render a hopeless program correct in just one round of fixes.

Everything that is known about software defects, testing and debugging tells us that eliminating bugs is very hard indeed – even with the best available programmers, testers, methods, tools and process.

None of which Imperial seems to have – or even have heard of.

Many of the most brilliant names in the field of programming have emphasized the same point over and over.

Ugly programs are like ugly suspension bridges: they’re much more liable to collapse than pretty ones, because the way humans (especially engineer-humans) perceive beauty is intimately related to our ability to process and understand complexity.
– Eric Raymond

Simplicity is prerequisite for reliability
– Edsger Dijkstra

The competent programmer is fully aware of the strictly limited size of his own skull; therefore he approaches the programming task in full humility, and among other things he avoids clever tricks like the plague.
– Edsger Dijkstra

The unavoidable price of reliability is simplicity.
– C A R Hoare

Debugging is twice as hard as writing the code in the first place. Therefore, if you write the code as cleverly as possible, you are, by definition, not smart enough to debug it.
– Brian W. Kernighan

One could add many more..

Mike Whittaker
Mike Whittaker
24 days ago
Reply to  Tom Welsh

You ought not use “Imperial” to represent the whole institution; it’s only the department originating the model and software, that’s in question, here.

malf
malf
22 days ago
Reply to  Mike Whittaker

No, this is basically a systemic problem across academia—it’s much worse in the humanities, but this is all about the quality we get for the billions of dollars that is spent globally on universities. The quality simply isn’t there, but they get lots of $$$, so, I mean, we all want to believe it’s quality, because we’re forced to pay them. Who wants to be forced to pay for something and have to simultaneously admit they’re being played by mediocre people? Academics are mediocre people—they’re slightly above average in most cases, the hollywood myth of the “maverick genius academic” is false. Most “geniuses” do exceptionally poorly in academia unless they come from rich families.

Phil Bull
25 days ago

Hi, “cosmology lecturer at Queen Mary” here. You are making lots of silly, baseless claims in these articles, which leads me to believe that you don’t actually understand scientific modelling or the sorts of ways in which these codes are actually used. I’ve tried to address some of your claims here, and am happy to clarify or go into more detail to explain (although for definiteness the actual authors of the Imperial code would have to be asked about the specifics): https://philbull.wordpress.com/2020/05/10/why-you-can-ignore-reviews-of-scientific-code-by-commercial-software-developers/

I really do think there’s more going on here though — your tone and the curious location of these posts suggests that this is actually just a hatchet job, designed to give cover for some political machinations that are just starting to play out in the press… It is a dangerous game to be playing given the risk of undermining public health outcomes.

Brian Sides
Brian Sides
24 days ago
Reply to  Phil Bull

I have read Phil bull article . He is so wrong on so many levels.
He starts Many scientists write code that is crappy stylistically, but which is nevertheless scientifically correct (following rigorous checking/validation of outputs etc).

Well the rigorous checking/validation is good but has that been done. Where can we see the test results.

It is not just about coding style. There are reasons for good programming practices. Mainly so us humans do not make so many mistakes.

But the program need to read the data without errors. Not just on a good day.
But with out error checking or data validation both absent in much of the cpp code. You are flying on a wing and prayer.

He is wrong about documentation and much else.
He thinks one big file is a good thing. I guess he has never done any serious programming.
If the cpp files were presented by a first year student. They should fail the course.

If they want to write crappy code that is OK. Just do not present the results from the crappy code to the outside world.

malf
malf
22 days ago
Reply to  Phil Bull

Why is it that every academic defending this sloppy programming wants to do the typical ivory-tower song and dance of “we have different rules…you’d understand that if you were one of us…”?

John
John
25 days ago

THE FOLLOWING GROUPS OF LINKS WILL SET FORTH EVIDENCE FOR THE CONNECTIONS BETWEEN

– flawed pandemic computer models

– provably exaggerated mortality statistics aimed at inducing fear and haste

– national economy-destroying lock-downs imposed by duped & compliant or compromised (or complicit) government leaders and their cabinets

– the impending introduction of digital national currencies linked to a global digital currency

– soon-to-be government-legislated mass-vaccination programs largely aimed at delivering indelible digital ID tattoos

– the harnessing of every thusly ID’d individual to a unique crypto-currency account connected to a centralized financial authority via the AWS cloud

– and more…

THE FOLLOWING GROUPS OF LINKS WILL SET FORTH EVIDENCE
https://pastebin.com/75CdVME4

g00se
g00se
24 days ago
Reply to  John

>>
Dr. Erickson COVID-19 Briefing – Full Video
https://www.bitchute.com/video/oGVRqleTzzMi/
<<
Thanks for this – highly interesting

RELEASE THE ORIGINAL CODE

NJC
NJC
24 days ago

I’m coming late to this invaluable thread and apologise if the following has already been widely commented on.
In a Sunday Times interview with Prof Ferguson on 29/3, the latter was quoted saying of his ICL model “the code is not a mess. It’s all in my head, completely undocumented”.
I had to read that twice to make sure I hadn’t dreamed it.

Brian Sides
Brian Sides
24 days ago
Reply to  NJC

Thanks for the information do you have a link
is it behind a pay wall

Sue Denim
Sue Denim
24 days ago
Reply to  Brian Sides

That is unbelievable. At first I thought NJC must be paraphrasing or exaggerating but they aren’t.

The story is headlined “Neil Ferguson interview: No 10’s infection guru recruits game developers to build coronavirus pandemic model”

https://www.thetimes.co.uk/article/neil-ferguson-interview-no-10s-infection-guru-recruits-game-developers-to-build-coronavirus-pandemic-model-zl5rdtjq5

It can be seen without a subscription if you have a login. The quote comes from this paragraph:

“Yet for other scientists the big problem with Ferguson’s model is that they cannot tell how it works. It consists of several thousand lines of dense computer code, with no description of which bits of code do what. Ferguson agreed this is a problem. “For me the code is not a mess, but it’s all in my head, completely undocumented. Nobody would be able to use it . . . and I don’t have the bandwidth to support individual users.” He plans to put that right by working with Carmack and others to publish the entire program as an interactive website.”

This interview is dated March 29th.

Brian Sides
Brian Sides
24 days ago
Reply to  Sue Denim

Thanks for posting this.

Simon Nicholls (sinichol)

Not much of your critique of process is wrong, but your conclusions that all the flaws in his process will only have lead to the wrong answer, is.

Good process just stops a rapid release cycle breaking code without you realising.

You need rigorous automated unit and regression testing built on top of continuous integration if you have 10s or 100s of people all working on the same code without and clear understanding of what the output should look like, and an aggressively commercial desire to move it forward.

Leading for me to the most famous misquote of all time, it is not “move fast and break things”. It is should in fact be “move fast and break what you don’t care about”. All code bases have redundant code you don’t rely on, efficient organisation don’t bother to maintain it, and use tests to allow them to rapidly refactor code they do.

I single developer with a limited need to have other people make rapid changes to their code is NOT going to produce the wrong answer just because they have weak processes, sure they are going to have to spend more time manually testing changes, and find it hard to grow a team round the work, but this they will learn these lessons organically as they grow.

You provide no evidence of the manual regression testing, or enforced runtime conditions he would work to, or enforce fellow researchers who might help him to adhere to – e.g. “only every run it on this box in this way and check the output against this reference runs saved state stored here.”

Fundamentally, none of your critique addresses the central question.

Was the output wrong?

https://www.imperial.ac.uk/media/imperial-college/medicine/sph/ide/gida-fellowships/Imperial-College-COVID19-NPI-modelling-16-03-2020.pdf
Page 13…
… Total Deaths > 2.6 > On Trigger 400 > PC_CI_HQ_SD (code meanings page 6)
… predicted 48k deaths for this scenario.
… in more detail… https://medium.com/pragmapolitic/the-ferguson-lynch-mob-strike-de80ee95cd0b

It is about helping the man develop better processes (in so many ways) not lynching him…

duncanpt
duncanpt
24 days ago

1/ Thanks for reviewing this code and writing your articles.

2/ Cutting through a lot of argument about whether academics should develop code properly or not (I’m on the properly side BTW), it seems to me that as soon as this model moved out of academic navel-gazing into affecting Government policy it stopped being the Imperial team’s private toy and entered the realm of mission-critical, life-critical software. After all, decisions have been made that have saved or destroyed lives, either by the lockdown mitigating the pandemic, or by the economic crash that has entailed, or by excess deaths consequent on the government actions but unnecessary otherwise (aka “the cure is worse than the disease”).

3/ The conclusion seems very clear that the Imperial model did not conform to sufficiently rigorous standards* and should have been subject to much more review and confirmation before being allowed onto the SAGE table, let alone to drive Government policy.

4/ On a slightly different point – but none the less related – why in the past 15 years or 13 years (estimates vary) was this model not rewritten and test so that it could reliably retrofit the previoius disease outbreaks it’s been used on? That is, having got the foot and mouth, swine flu, BSE, outbreaks etc so spectacularly wrong, why did Ferguson et al not at least use those experiences to drag the model into at least the correct order of magnitude?

* Actually it appears to adhere to no standards at all worth speaking of, and the very idea of 15,000 lines of uncommented code fills me with amazement that anyone could convince themselves they understood what was going on.

John
John
24 days ago

Aside from the code issues revealed by Sue Denim’s eye-opening analysis, I wonder which hardware configuration Mr. Ferguson used to run his model on.

If it was an Intel-based rig then that may have introduced some further variables given Intel’s well publicised litany of security issues with their SoC’s, ME and AMT (depending on the generation involved).

Then there’s the RAM. I would certainly hope that his computer was running ECC RAM which had been well tested. Even so, it’s my understanding that rowhammer attacks can be an issue with DDR3 ECC RAM and to a lesser degree DDR4 ECC RAM as well.

And what about the machine’s network security?

I’m mentioning these things because I feel they may be worth considering as a wild card, so to speak.

Stranger hacks have happened. For example:

2018 – “May – Bitcoin Gold – $18 Million Worth of BTG

This is probably one of the stranger hacks on our list, as a cryptocurrency exchange wasn’t hacked but a cryptocurrency was. Bitcoin Gold was an offshoot of the original Bitcoin, which took a hard fork from Bitcoin as an attempt to decentralize (ironic given that Bitcoin is already decentralized).

Bitcoin Gold became the victim of a 51% attack, a rare occurrence where hackers managed to gain control of more than 50% of the networks computing power. From there, attackers can prevent confirmations, allowing them to effectively stop payments between users and make changes to the network’s blockchain ledger. This type of attack was thought to be rare, if not impossible, until the Bitcoin Gold incident.

Using some complicated maneuvers, hackers put their Bitcoin Gold onto exchanges, traded them for other cryptocurrencies, then withdrew the amount. And because they had control of Bitcoin Gold’s blockchain ledger, they could simply return the original Bitcoin Gold back into their own wallet, essentially stealing money from exchanges.”

15.04.2020
A Comprehensive List of Cryptocurrency Exchange Hacks – SelfKey
https://selfkey.org/list-of-cryptocurrency-exchange-hacks/

As an aside, the 51% attack described above is further interesting in view of the lemmings-rush towards blockchain-based digital national currencies linked to a digital global currency (IMF SDRs), not to mention ID2020 etc–given the steady progress in developing quantum computers.

John
John
24 days ago
Reply to  John

There’s also this tangent:

A reason to suspect that blockchain technology originated from an affiliate of the City of London’s oligarchy and therefore may contain a hidden back door enabling a 51% attack:

“Satoshi Nakamoto is the name used by the presumed pseudonymous[1][2][3][4] person or persons who developed bitcoin, authored the bitcoin white paper, and created and deployed bitcoin’s original reference implementation.[5] As part of the implementation, Nakamoto also devised the first blockchain database.[6] In the process, Nakamoto was the first to solve the double-spending problem for digital currency using a peer-to-peer network. Nakamoto was active in the development of bitcoin up until December 2010.[7] Many people have claimed, or have been claimed, to be “Satoshi”. …

… The use of British English in both source code comments and forum postings – such as the expression “bloody hard”, terms such as “flat” and “maths”, and the spellings “grey” and “colour”[17] – led to speculation that Nakamoto, or at least one individual in the consortium claiming to be him, was of Commonwealth origin.[22][10][25] The reference to London’s The Times newspaper in the first bitcoin block mined by Satoshi suggested to some a particular interest in the British government.[17][27]:18″ …”

Satoshi Nakamoto – Wikipedia
https://en.wikipedia.org/wiki/Satoshi_Nakamoto

‘Just a thought.

Mike Whittaker
Mike Whittaker
24 days ago
Reply to  John

What on Earth have security vulnerabilities to do with running a software model?
ECC RAM issues and Rowhammer attacks have nothing to do with everyday computing.

Threadchecker
Threadchecker
24 days ago

In order to get repeatable results with threading enabled, comment out line 2203 of SetupModel.cpp:

#pragma omp parallel for private(i,i2,j,k,l,m,f2,f3,t,ct,s) reduction(+:ca)

Dario
Dario
24 days ago

Here is a validation of a any epidemiology model according to real world numbers. There was Zagreb’s earthquake on March 22nd that revealed all the characteristics of Covid-19, where a month after the earthquake there was no increase in the number of infected or ill people. On the contrary, there were only 15 infected people per day in average after a month from earthquake. And about a million people were on the streets of Croatia’s capitol almost whole that day at a temperature of about zero degrees along with hospital patients. So, corona virus is not so contagious and dangerous for whole population but some smaller group of people with specific health characteristics that are yet to be found. And it is not clear if corona virus is a primary cause of disease at all.

Mike Whittaker
Mike Whittaker
24 days ago
Reply to  Dario

No; with such a small sample size , you cannot draw such conclusions. If none of the infected parties was in the crowded area, there would have been no transmission.

Mike Whittaker
Mike Whittaker
24 days ago
Reply to  Dario

Visit Northern Italy …

Dario
Dario
24 days ago
Reply to  Mike Whittaker

That is in contradiction that virus is very contagious and could be spread easily in 2m contact. That is the main reason for lock-down…

Mike Whittaker
Mike Whittaker
23 days ago
Reply to  Dario

Depends where the infected persons are. As stated, with such a small infected base, it shifts down from continuum modelling to discrete events, very much dependent on actual path of events.

Mike Whittaker
Mike Whittaker
24 days ago

Insurers are evaluating over a payment premium / pay out base, and looking to financial averages.

For the health of a country, were are looking at metrics like “avoidable deaths” and “QUALY”s …

polocanthus
polocanthus
23 days ago
Reply to  Mike Whittaker

I think there may be a tentative link between deaths and insurance payouts.

Mike Whittaker
Mike Whittaker
23 days ago
Reply to  polocanthus

Insurers can average things out, financially.

polocanthus
polocanthus
24 days ago

researchers hacking out some rough code to test a new idea is one thing ….
…… Developing code over 15 years requires different standards.

In Ferguson’s own blurb from his university profile:

An important strand of my research program is therefore to develop the statistical and mathematical tools necessary for such increasingly sophisticated models to be rigorously tested and validated against epidemiological, molecular and experimental data.

Of course the grunt work goes to the research students. Maybe they didn’t tell him what a mess his model was getting into.