by Sue Denim (not the author’s real name)
Imperial finally released a derivative of Ferguson’s code. I figured I’d do a review of it and send you some of the things I noticed. I don’t know your background so apologies if some of this is pitched at the wrong level.
My background. I wrote software for 30 years. I worked at Google between 2006 and 2014, where I was a senior software engineer working on Maps, Gmail and account security. I spent the last five years at a US/UK firm where I designed the company’s database product, amongst other jobs and projects. I was also an independent consultant for a couple of years. Obviously I’m giving only my own professional opinion and not speaking for my current employer.
The code. It isn’t the code Ferguson ran to produce his famous Report 9. What’s been released on GitHub is a heavily modified derivative of it, after having been upgraded for over a month by a team from Microsoft and others. This revised codebase is split into multiple files for legibility and written in C++, whereas the original program was “a single 15,000 line file that had been worked on for a decade” (this is considered extremely poor practice). A request for the original code was made 8 days ago but ignored, and it will probably take some kind of legal compulsion to make them release it. Clearly, Imperial are too embarrassed by the state of it ever to release it of their own free will, which is unacceptable given that it was paid for by the taxpayer and belongs to them.
The model. What it’s doing is best described as “SimCity without the graphics”. It attempts to simulate households, schools, offices, people and their movements, etc. I won’t go further into the underlying assumptions, since that’s well explored elsewhere.
Non-deterministic outputs. Due to bugs, the code can produce very different results given identical inputs. They routinely act as if this is unimportant.
This problem makes the code unusable for scientific purposes, given that a key part of the scientific method is the ability to replicate results. Without replication, the findings might not be real at all – as the field of psychology has been finding out to its cost. Even if their original code was released, it’s apparent that the same numbers as in Report 9 might not come out of it.
Non-deterministic outputs may take some explanation, as it’s not something anyone previously floated as a possibility.
The documentation says:
The model is stochastic. Multiple runs with different seeds should be undertaken to see average behaviour.
“Stochastic” is just a scientific-sounding word for “random”. That’s not a problem if the randomness is intentional pseudo-randomness, i.e. the randomness is derived from a starting “seed” which is iterated to produce the random numbers. Such randomness is often used in Monte Carlo techniques. It’s safe because the seed can be recorded and the same (pseudo-)random numbers produced from it in future. Any kid who’s played Minecraft is familiar with pseudo-randomness because Minecraft gives you the seeds it uses to generate the random worlds, so by sharing seeds you can share worlds.
Clearly, the documentation wants us to think that, given a starting seed, the model will always produce the same results.
Investigation reveals the truth: the code produces critically different results, even for identical starting seeds and parameters.
I’ll illustrate with a few bugs. In issue 116 a UK “red team” at Edinburgh University reports that they tried to use a mode that stores data tables in a more efficient format for faster loading, and discovered – to their surprise – that the resulting predictions varied by around 80,000 deaths after 80 days:

That mode doesn’t change anything about the world being simulated, so this was obviously a bug.
The Imperial team’s response is that it doesn’t matter: they are “aware of some small non-determinisms”, but “this has historically been considered acceptable because of the general stochastic nature of the model”. Note the phrasing here: Imperial know their code has such bugs, but act as if it’s some inherent randomness of the universe, rather than a result of amateur coding. Apparently, in epidemiology, a difference of 80,000 deaths is “a small non-determinism”.
Imperial advised Edinburgh that the problem goes away if you run the model in single-threaded mode, like they do. This means they suggest using only a single CPU core rather than the many cores that any video game would successfully use. For a simulation of a country, using only a single CPU core is obviously a dire problem – as far from supercomputing as you can get. Nonetheless, that’s how Imperial use the code: they know it breaks when they try to run it faster. It’s clear from reading the code that in 2014 Imperial tried to make the code use multiple CPUs to speed it up, but never made it work reliably. This sort of programming is known to be difficult and usually requires senior, experienced engineers to get good results. Results that randomly change from run to run are a common consequence of thread-safety bugs. More colloquially, these are known as “Heisenbugs“.
But Edinburgh came back and reported that – even in single-threaded mode – they still see the problem. So Imperial’s understanding of the issue is wrong. Finally, Imperial admit there’s a bug by referencing a code change they’ve made that fixes it. The explanation given is “It looks like historically the second pair of seeds had been used at this point, to make the runs identical regardless of how the network was made, but that this had been changed when seed-resetting was implemented”. In other words, in the process of changing the model they made it non-replicable and never noticed.
Why didn’t they notice? Because their code is so deeply riddled with similar bugs and they struggled so much to fix them that they got into the habit of simply averaging the results of multiple runs to cover it up… and eventually this behaviour became normalised within the team.
In issue #30, someone reports that the model produces different outputs depending on what kind of computer it’s run on (regardless of the number of CPUs). Again, the explanation is that although this new problem “will just add to the issues” … “This isn’t a problem running the model in full as it is stochastic anyway”.
Although the academic on those threads isn’t Neil Ferguson, he is well aware that the code is filled with bugs that create random results. In change #107 he authored he comments: “It includes fixes to InitModel to ensure deterministic runs with holidays enabled”. In change #158 he describes the change only as “A lot of small changes, some critical to determinacy”.
Imperial are trying to have their cake and eat it. Reports of random results are dismissed with responses like “that’s not a problem, just run it a lot of times and take the average”, but at the same time, they’re fixing such bugs when they find them. They know their code can’t withstand scrutiny, so they hid it until professionals had a chance to fix it, but the damage from over a decade of amateur hobby programming is so extensive that even Microsoft were unable to make it run right.
No tests. In the discussion of the fix for the first bug, Imperial state the code used to be deterministic in that place but they broke it without noticing when changing the code.
Regressions like that are common when working on a complex piece of software, which is why industrial software-engineering teams write automated regression tests. These are programs that run the program with varying inputs and then check the outputs are what’s expected. Every proposed change is run against every test and if any tests fail, the change may not be made.
The Imperial code doesn’t seem to have working regression tests. They tried, but the extent of the random behaviour in their code left them defeated. On 4th April they said: “However, we haven’t had the time to work out a scalable and maintainable way of running the regression test in a way that allows a small amount of variation, but doesn’t let the figures drift over time.”
Beyond the apparently unsalvageable nature of this specific codebase, testing model predictions faces a fundamental problem, in that the authors don’t know what the “correct” answer is until long after the fact, and by then the code has changed again anyway, thus changing the set of bugs in it. So it’s unclear what regression tests really mean for models like this – even if they had some that worked.
Undocumented equations. Much of the code consists of formulas for which no purpose is given. John Carmack (a legendary video-game programmer) surmised that some of the code might have been automatically translated from FORTRAN some years ago.
For example, on line 510 of SetupModel.cpp there is a loop over all the “places” the simulation knows about. This code appears to be trying to calculate R0 for “places”. Hotels are excluded during this pass, without explanation.
This bit of code highlights an issue Caswell Bligh has discussed in your site’s comments: R0 isn’t a real characteristic of the virus. R0 is both an input to and an output of these models, and is routinely adjusted for different environments and situations. Models that consume their own outputs as inputs is problem well known to the private sector – it can lead to rapid divergence and incorrect prediction. There’s a discussion of this problem in section 2.2 of the Google paper, “Machine learning: the high interest credit card of technical debt“.
Continuing development. Despite being aware of the severe problems in their code that they “haven’t had time” to fix, the Imperial team continue to add new features; for instance, the model attempts to simulate the impact of digital contact tracing apps.
Adding new features to a codebase with this many quality problems will just compound them and make them worse. If I saw this in a company I was consulting for I’d immediately advise them to halt new feature development until thorough regression testing was in place and code quality had been improved.
Conclusions. All papers based on this code should be retracted immediately. Imperial’s modelling efforts should be reset with a new team that isn’t under Professor Ferguson, and which has a commitment to replicable results with published code from day one.
On a personal level, I’d go further and suggest that all academic epidemiology be defunded. This sort of work is best done by the insurance sector. Insurers employ modellers and data scientists, but also employ managers whose job is to decide whether a model is accurate enough for real world usage and professional software engineers to ensure model software is properly tested, understandable and so on. Academic efforts don’t have these people, and the results speak for themselves.
My identity. Sue Denim isn’t a real person (read it out). I’ve chosen to remain anonymous partly because of the intense fighting that surrounds lockdown, but there’s also a deeper reason. This situation has come about due to rampant credentialism and I’m tired of it. As the widespread dismay by programmers demonstrates, if anyone in SAGE or the Government had shown the code to a working software engineer they happened to know, alarm bells would have been rung immediately. Instead, the Government is dominated by academics who apparently felt unable to question anything done by a fellow professor. Meanwhile, average citizens like myself are told we should never question “expertise”. Although I’ve proven my Google employment to Toby, this mentality is damaging and needs to end: please, evaluate the claims I’ve made for yourself, or ask a programmer you know and trust to evaluate them for you.

Devastating. Heads must roll for this, and fundamental changes be made to the way government relates to academics and the standards expected of researchers. Imperial College should be ashamed of themselves.
The UK government should be just as ashamed for taking their advice.
And anyone in the media who repeated their nonsense.
The problem is the nature of government and politics. Politics is a systematic way of transferring the consequences of inadequate or even reckless decision-making to others without the consent or often even the knowledge of those others. Politics and science are inherently antithetical. Science is about discovering the truth, no matter how inconvenient or unwelcome it may be to particular interested parties. Politics is about accomplishing the goal of interested parties and hiding any truth that would tend to impede that goal. The problem is not that “government has being doing it wrong;” the problem is that government has been doing it.
This article explains how such software should be written. (After the domain experts have reasoned out a correct model and had it verified by open peer review, and if possible by formal methods).
“They Write the Right Stuff” by Charles Fishman, December 1996
https://www.fastcompany.com/28121/they-write-right-stuff
After all, only 7 lives depended directly on the Space Shuttle software. The Imperial College program seems likely to have cost many thousands of extra deaths, and to have seriously damaged the economies and societies of scores of countries, affecting possibly billions of lives.
So why should the resources invested in the two efforts have been so vastly different?
But they wont. Everyone involved in this now has skin in the game to ensure NOTHING happens and the lockdown carries on as if its the only thing keeping the entire country from dying.
Well, this is exactly why there is a growing movement in academia at grassroots level to campaign for groups to use proper software practices (version control, automated testing and so on).
Vital to factor in Britain’s endemic corruption before seeking head-roll redress. There is none.
I speak from experience. Case study: https://spoilpartygames.co.uk/?page_id=4454
Thank you so much for this! This code should’ve been available from the outset.
Amateur Hour all round!
The code should have been made available to all other Profs & top Coders & Data Scientists & Bio-Statisticians to PEER Review BEFORE the UK and USA Gvts made their decisions. Imperial should be sued for such amateur work.
Guy at carnival: Here, drink this
Some ol’bloke : What is it?
Guy at carnival: Never mind, it will fix what’s ailing ya
Some ol’bloke : What’s it cost?
Guy at carnival: It doesn’t matter, it’s a deal at twice the price
Some ol’bloke : What’s in it?
Guy at carnival: Shhhhh, just take 3 swigs
Some ol’bloke : It tastes horrible
Guy at carnival: Ya, but it will help you
Some ol’bloke : …if you say so
Guy at carnival: I know hey, but you feel better already
But “This code” isn’t what Ferguson was running. The code on github has been munged by other authors in attempt to make it less horrifying. We must remember that what he ran was much worse than what we can see, which is bad enough.
This is an outstanding investigation. Many thanks for doing it – and to Toby for providing a place to publish it.
So this is ‘the science’ that the Government thinks is that it is following!
*the Government reminds us*
This is isn’t a piece of poor software for a computer game, it is, apparently, the useless software that has shut down the entire western economy. Not only will it have wasted staggeringly vast sums of money but every day we are hearing of the lives that will be lost as a result.
We are today learning of 1.4 million avoidable deaths from TB but that is nothing compared to the UN’s own forecast of “famine on a biblical scale”. Does one think that the odious, inept, morally bankrupt hypocrite, Ferguson will feel any shame, sorrow or remorse if, heaven forbid, the news in a couple of months time is dominated by the deaths of hundreds of thousands of children from starvation in the 3rd World or will his hubris protect him?
I don’t understand why governments are still going for this ridiculous policy and NGOs all pretend it is Covid 19 that will cause this devastation RATHER than our reaction to it.
Impperial and the Professor should start to worry about claims for losses incurred as a result of decisions taken based on such a poor effort. Could we know, please, what this has cost over how many years and how much of the Professor’s career has been achieved on the back of it.
Remember that Ferguson has a track record of failure:
in 2002 he predicted 50,000 people would die of BSE. Actual number: 178 (national CJD research and survellance team)
In 2005 he predicted 200 million people would die of avian flu H5N1. Actual number according to the WHO: 78
In 2009 he predicted that swine flu H1N1 would kill 65,000 people. Actual number 457.
In 2020 he predicted 500,000 Britons would die from Covid-19.
Still employed by the government. Maybe 5th time lucky?
Maybe but he’ll have to step up his game.
Pathetic review. You should go through the logic of what is coded and not write superficial criticisms which implies you know nothing of what you critique.
If only the code could actually be understood. It’s so bad you can’t even be certain of what exactly it’s doing.
Pretty sure the only point of the article was to bring light to the fact that the “model” is flawed and Ferguson has a track record of being VERY wrong on mortality rate predictions based upon flawed models. Solution, stop it. This time around it almost took down an entire country’s economy because of elitist’s overreaction and overreach. Just stop it.
Why any of this isn’t obvious to our politicians says a lot about our politicians, but your summary also shows that that it is ENGINEERs and not academics that should be generating the input to policy making. It is only engineers who have the discipline to make things work, properly and reliably.
For decades I have opined that our society was exposed to the risk inherent in being a technologically dependent culture governed by the technically illiterate. QED?
“The Chinese Government Is Dominated by Scientists and Engineers”
https://gineersnow.com/leadership/chinese-government-dominated-scientists-engineers
This kind of thing frequently happens with academic research. I’m a statistician and I hate working with academics for exactly this sort of reason.
the global warming models are secret too (mostly) and probably the same kind of mess as this code
Perhaps, if enough people come to understand how badly this has been managed, they will start to ask the same questions of the climate scientists and demand to see their models published.
It could be the start of some clearer reasoning on the whole subject, before we spend the trillions that are being demanded to avert or mitigate events that may never happen.
Michael Mann pointedly refused to share his modelling code for climate change when he was sued for libel in a Canadian court. Ended up losing that will cost him millions. Now why would an academic rather lose millions of dollars than show their working.
Lets hope this “workings not required” doesn’t get picked up by schoolkids taking their exams 🙂
Tried to find something about this on the BBC news site. Found this:
https://www.bbc.com/news/uk-politics-52553229
At the end of the article, there is “analysis” from a BBC health correspondent.
With such pitiful performance from the national broadcaster, I think Ferguson and his team will face no consequences.
It raises the questions (a) what other academic models that have driven public policy have such bad quality?, and (b) do the climate models suffer in the same way, also making them untrustworthy?
Similar skeptical attention should be paid to the credibility automatically granted to economic model projections – even for decades ahead. Economic estimates are routinely treated as facts by the biggest U.S. newspaper and TV networks, particularly if the estimates are (1) from the Federal Reserve or Congressional Budget Office, and (2) useful as a lobbying tool to some politically-influential interest group.
Just wonderful and sadly utterly devastating. As an IT bod myself and early days skeptic this was such a pleasure to ŕead. Well done
Thanks for doing the analysis. Totally agree that leaving this kind of job to amateur academics is completely non sensical. I like your suggestion of using the insurance industry and if I were PM I would take that up immediately.
Look at SetupModel.ccp from line 2060 – pages of nested conditionals and loops with nary a comment. Nightmare!
The best is there’s all this commented out code. Was it commented out by accident? Was there a reason for it being there to begin with? Who knows, it’s a mystery.
Haven’t time to read the article and stopped at the portion where the data can’t be replicated. That right there is a huuuuuuge red flag and makes the “models” useless. I’ll come back tonight to finish reading. I have to ask: Is this the same with the University of Washington IMHE models?. Why do I have a sneaking suspicion that it is.
The IMHE ‘model’ is much worse – it’s just a simple exercise in curve fitting, with little or no actual modelling happening at all. I have collected screenshots of its predictions (for the US, UK, Italy, Spain, Sweden) every few days over the last few weeks, so I could track them against reality, and it is completely useless. But, according to what I’ve read, the US government trusts it!
Until a few days ago, its curves didn’t even look plausible – for countries on a downward trend (e.g. Italy and Spain), they showed the numbers falling off a cliff and going down to almost zero within days, and for countries still on an upward trend (e.g. the UK and Sweden) they were very pessimistic. However, the figures for the US were strangely optimistic – maybe that’s why the White House liked them.
They seem to have changed their model in the last few days – the curves look more plausible now. However, plausible looking curves mean nothing – any one of us could take the existing data (up to today) and ‘extrapolate’ a curve into the future. So plausibility means nothing – it’s just making stuff up based on pseudo-science. In the UK, we’re not supposed to dissent, because that implies that we don’t want to ‘save lives’ or ‘protect the NHS’, so the pessimistic model wins. In the US, it’s different, depending on people’s politics, so I’m not going to try to analyse that.
So why do governments leap at these pseudo-models with their useless (but plausible-looking) predictions? It’s because they hate not knowing what’s going to happen, so they are willing to believe anyone with academic credentials who claims to have a crystal ball. And, if there are competing crystal balls from different academics, the government will simply pick the one that matches its philosophy best, and claim that it is ‘following the science’.
Ditto. The IMHE predictions are completely silly.
They leap at them for fear of the MSM accusing them of not doing anything.
I had hoped Donald Trump would be a stronger leader than that, and insisted on any model being independently and repeatedly verified before making any decision.
The other factor that seems entirely missing from the models is the ability of existing medicines, even off-label ones, to treat the virus, and there have been many trials of Hydroxy Chloroquine with Zinc sulphate (& some also with Azithromycin) that have demonstrated great success. It constantly dismays me that this is ignored, and here in the UK, patients are just given paracetamol; as if they have a headache!!
I offer a critical review of past and present IHME death projections here: https://www.cato.org/blog/six-models-project-drop-covid-19-deaths-states-open
This is scary stuff. I’ve been a professional developer and researcher in the finance sector for 12 years. My background is Physics PhD. I have seen this sort of single file code structure a lot and it is a minefield for bugs. This can be mitigated to some extent by regression tests but it’s only as good as the number of test scenarios that have been written. Randomness cannot just be dismissed like this. It is difficult to nail down non-determinism but it can be done and requires the developer to adopt some standard practices to lock down the computation path. It sounds like the team have lost control of their codebase and have their heads in the sand. I wouldn’t invest money in a fund that was so shoddily run. The fact that the future of the country depends on such code is a scandal
‘Software volatility’ is the expression Robin and it is always bad.
I have not looked at Neil Ferguson’s model and I’m not interested in doing so. Ferguson has not influenced my thinking in any way and I have reached my own conclusions, on my own. I made my own calculation at the end of January, estimating the likely mortality rate of this virus. I’m not going to tell you the number, but suffice to say that I decided to fly to a different country, stock up on food right, and lock myself up so I have no contact with anybody, right at the beginning of February, when nobody else was stocking up yet, nobody else was locking themselves up, and people thought it was all a bit strange. When I flew to my isolation location, I wore a mask, and everyone thought it was a bit strange. Make your own conclusions.
I’ve read this review.
Firstly, I’ll stress this again, I’m not going to defend Ferguson’s model. I have not seen it. I don’t know what it’s like. I don’t know if it’s any good.
I don’t share Ferguson’s politics, even less so those of his girlfriend.
His estimate of the number that would likely die if we took no public health measures IMO is not an over-estimate. There are EU countries which have conducted tests of large random, unbiased, samples of their population to estimate what percentage of their population has had the virus. The number – in case of those countries – comes out at 2%-3%. If the same is true of the UK, then 30,000 deaths would translate to 1 million deaths if the virus infected everybody. Of course, we don’t know if the same is true of the UK.
But now I am going to criticize this criticism of Ferguson’s model, because it deserves criticism.
I’ve been writing software for 41 years. Including modeling and simulation software. I wrote my first stochastic Monte Carlo simulator 37 years ago. I have written millions of lines of code in dozens of different programming languages. I have designed many mathematical models, including stochastic ones.
Ferguson’s code is 30 years old. This review criticizes it as though it was written today, but many of these criticisms are simply not valid when applied to code that’s 30 years old. It was normal to write code that way 30 years ago. Monolithic code was much more common, especially for programs that were not meant to produce reusable components. Both disk space, RAM, and CPU speeds were not amenable to code being structured to the same extent it is today. Yes, structured programming was known, yes, software libraries were used, but programs like simulation software generally consisted of at most a handful of different source files.
30 years ago, there was no multi-threading, so it was reasonable to write programs on the assumption that they were going to run on a single-threaded CPU. With few exceptions, like people working on Transputers, nobody had access to a multi-threaded computer. I can’t say what is making his code not thread safe, but not being thread safe does not necessarily imply bad coding style, or bad code. There are many functions even in the standard C library which are not thread safe, and some that come in two flavours – thread safe and not thread safe. The thread safe version normally has more overhead and it is less efficient. Today, this may make no difference, but 30 years ago, that mattered. A lot. Writing code which was not thread safe, if you were optimizing for speed, may have made perfect sense.
While not documenting your programs was not great practice even back then, it was also very common, especially for programs which were initially designed for a very specific application, and were not meant to be reused in other projects or libraries. There is nothing particularly unusual about this.
It’s perfectly normal not to want to disclose 30 year old code because, as has been proven by this very review, people will look at it and criticize it as if it was modern code.
So Ferguson evidently rewrote his program to be more consistent with modern coding standards before releasing it. And probably introduced a couple of bugs in the process. Given the fact that the original code was undocumented, old, and that he was under time pressure to produce it in a hurry, it would have been strange if this didn’t introduce some bugs. This does not, per se, invalidate the model. Your review does not give any reason to think these bugs existed in the original code or that they were material.
The review criticizes the code because the model used is stochastic. Which means random, the review goes on explain. Random – surely this must be bad! But stochastic models and Monte Carlo simulation are absolutely standard techniques. They are used by financial institutions, they were used 30 years ago for multi-dimensional numerical integration, they are used everywhere. The very nature of the system being modeled is fundamentally and intrinsically stochastic. Are you saying you have a model which can predict, with certainty, how many dead people there will be 2 weeks from now? No, of course you don’t. This depends on so many variables, most of which are random, and so they have to be modeled as being random. From the way you describe the model (SimCity-like), it sounds like it models individual actors, so it ipso facto it has to be stochastic. How else do you model the actions of many independent individual human actors?
I don’t know the author or anything about her background. But it doesn’t sound to me like she was writing software or making mathematical models 30 years ago, or she wouldn’t be making many of the statements she is making.
Reviewing Ferguson’s model in depth is certainly something that someone ought to do. But a serious review would understand what the (stochastic) model does, explain what it does, and assess the model on its merits. I have no idea whether the model would survive such a review well or be torn to shreds by it. But this review just scratches the surface, and criticizes Ferguson’s software in very superficial ways, largely completely unwarranted. It does not even present the substance of the model.
I read the author’s discussion of the single-thread/multi-thread issue not so much as a criticism but as a rebuttal to possible counter-arguments. I agree it probably should have been left out (or relegated to a footnote), but the rest of the author’s arguments stand independently of the mult-thread issues.
I disagree with your framing of the author’s other criticisms as amounting to criticism of stochastic models. It does not appear the author has an issue with stochastic models, but rather with models where it is impossible to determine whether the variation in outputs is a product of intended pseudo-randomness or whether the variation is a product of unintended variability in the underlying process.
dr_t,
I am also a Software Engineer with over 35 years of experience, so I understand what you are saying as far as 30 year old code, however if the software is not fit for purpose because it is riddled with bugs, then it should not be used for making policy decisions. And frankly I don’t care how old the code is, if it is poorly written and documented, then it should be thrown out and rewritten, otherwise it is useless.
As a side note, I currently work on a code base that is pure C and close to 30 years old. It is properly composed of manageable sized units and reasonably organized. It also has up to date function specifications and decent regression tests. When this was written, these were probably cutting-edge ideas, but clearly wasn’t unknown. Since then we’ve upgraded to using current tech compilers, source code repositories, and critical peer review of all changes.
So there really is no excuse for using software models that are so deficient. The problem is these academics are ignorant of professional standards in software development and frankly don’t care. I’ve worked with a few over the course of my career and that has been my experience every time.
I agree 100%, I wrote c/c++ code for years and this single file atrocity reminds me of student code
The fact it wasn’t refactored in 30 years is a sin plain and simple.
More over, this was likely the ‘code’ used for his swine flu predictions – which performed magnificently 😉
I was coding on a large multi-language and multi-machine project 40 years ago. This was before Jsckson Structured Programming, but we were still required to document, to modularise, and to perform regression testing as well as test for new functionality. These were not new ideas when this model was originally created.
The point of key importance is that code must be useful to the user. This is normally ensured by managers providing feedback from the business and specifying user requirements in better detail as the product develops. And this stage was, of course , missing here.
Instead we had the politicians deferring to the ‘scientists’, who were trying out a predictive model untested against real life. That seems to have worked out about as well as if you had sacked the sales team of a company and let the IT manager run sales simulations on his own according to a theory which had been developed by his mates…
> untested against real life.
And _untestable_? There is no mention in the review of how many parameter values need to be fixed to produce a run. More than 6-10 and I cannot imagine searching for parameters for a best fit [to past data] to result in stable values over time.
All I know is that my son is same as Ferguson. Physics PhD BUT is now a commercial machine learning Data Scientist. However, he has spent five years out of academia learning the additional software skills required, passing all AWS certs etc. Ferguson didn’t.
How wrong you will be proved to be. Testing is already indicating that huge numbers of the global population have already caught it. The virus has been in Europe since December at the latest, and as more information comes to light, that date will likely be moved significantly backwards. If the R0 is to be believed, the natural peak would have been hit, with or without lockdown, in March or April. That is what we have seen.
This virus will be proven to be less deadly than a bad strain of influenza, with it without a vaccinated population. Total deaths have only peaked post lockdown. That is not a coincidence.
@Robbo Why is it not a coincidence? I am not sure what to think about this virus: you say it will proven to be like a bad strain of influenza, but I work in a hospital and our clinical staff are saying they have never seen anything like it in terms of number of deaths.
There empty hospitals full of tik tok stars?
I would not be surprised at a large number of initial deaths with a new disease when the medical staff have no protocol for dealing with it. In fact, I understand that their treatment was sub-optimal and could have made things worse.
When we have a treatment for it we will see how dangerous it is compared to flu. Which can certainly kill you if not treated properly…
How many of these clinical staff were working during the 1957 pandemic? Probably …. none. It was worse on an absolute and per-capita basis than what we’re seeing now.
Brilliant comment. This model assumes first infections at least two months too late. The unsuppressed peak was supposed to be mid May (the ‘terrifying’ graph) so what we have seen in April is likely the real peak and lockdown has had no impact on the virus. Lockdown will have killed far more people. Elderly see no point in living in lockdown. Anecdotal reports that people in care homes have just stopped eating.
Nope. Base rate. Look outside your own wee land.
Spain is a better example.
Just model Spain with a simple statistical model and you see the lockdown impact.
It’s easy, you can do it in an afternoon.
> If the R0 is to be believed, the natural peak would have been hit, with or without lockdown, in March or April. That is what we have seen.
That’s what we’ve seen WITH lockdown. We haven’t tried a no-lockdown scenario, so we don’t know in practice when that would have peaked.
> This virus will be proven to be less deadly than a bad strain of influenza
Flu kills around 30,000/year in the US, mostly over a five-month period. Covid-19 has killed 70,000 in about six weeks, despite the lockdown.
@Frank,
“That’s what we’ve seen WITH lockdown. We haven’t tried a no-lockdown scenario, so we don’t know in practice when that would have peaked”.
Incorrect.
Peak deaths in NHS hospitals in England were 874 on 08/04. A week earlier, on 01/04, there were 607 deaths. Crude Rt = 874/607 = 1.4. On average, a patient dying on 08/04 would have been infected c. 17 days earlier on 22/03. So, by 22/03 (before the full lockdown), Rt was (only) approx 1.4.
Ok, so that doesn’t tell us too much, but if we repeat the calculation and go back a further week to 15/03, Rt was approx 2.3. Another week back to 08/03 and it was approximately 4.0.
Propagating forward a week from 22/03, Rt then fell to 0.8 on 29/03
So you can see that Rt fell from 4.0 to 1.4 over the two weeks preceding the full lockdown and then from 1.4 to 0.8 over the following week, pretty much following the same trend regardless.
So, using the data we can see that we could have predicted the peak before the lockdown occurred, simply using the trend of Rt.
In my hypothesis, this was a consequence of limited social distancing (but not full lockdown) and the virus beginning to burn itself out naturally, with very large numbers of asymptomatic infections and a degree of prior immunity.
Peak excess all-cause mortality was last week – yes the last week in April. Don’t just look at reported COVID19 hospital deaths, And don’t just focus on one model.
How do you know that? ONS stats have only just been published for w/e 24th April and they were down a bit on the week before?
Epidemic curve are flat or down in so many countries with such different mitigation policies that it’s hard to say this policy or that made big difference, aside from two – ban all international travel by ship or airplane and stop mass transit commuting. No U.S. state could or did so either, but island states like New Zealand could and did both. In the U.S., state policies differ from doing everything (except ban travel and transit) to doing almost nothing (9 low-density Republican states, like Utah and the Dakotas). But again, Rt is at or below in almost all U.S. states, meaning the curve is flat or down. Policymakers hope to take credit for something that happened regardless of their harsh or gentle “mitigation” efforts, but it looks like something else –such as more sunshine and humidity or the virus just weakening for unknown reasons (as SARS-1 did in the U.S. by May). https://rt.live/
Frank, the peak Flu season is December through February, which is about the same amount of time that we’ve officially been recording deaths in the U.S. from the SARS-CoV-2 pathogen (February through April). Likewise, regarding a lockdown vs. no lockdown scenario comparison, that is also offset by the vaccine vs. no vaccine aspect of these two pathogens.
Please keep in mind that we’ve had numerous Flu seasons where between 60,000 to more than 100,000 Americans have passed away due to it, all despite a solid vaccination program.
“Flu season deaths top 80,000 last year, CDC says”
By Susan Scutti, CNN
Updated 1645 GMT (0045 HKT) September 27, 2018
https://edition.cnn.com/2018/09/26/health/flu-deaths-2017–2018-cdc-bn/index.html
Yes but the manner in which they count COVID-19 deaths is flawed. Even with co-morbidity they ascribe to COVID, and in cases where they do not test but there were COVID-like symptoms, they ascribe it to COVID according to CDC.
“The virus has been in Europe since December at the latest” https://www.sciencedirect.com/science/article/pii/S1567134820301829?via%3Dihub
Oh yes. The model is all rather irrelevant now as we catch up on burying the dead.
In point of fact a ten line logistic model does as good a job.
Still, academic coding is usually a disaster. I went back to grad school in my late thirties after twenty years of software development. I should have brought a bigger stick.
“I should have brought a bigger stick”.
A PART, maybe? “Professor Attitude Realignment Tool”
I have not seen the model or intend to do either. On thing that rang alarm bells with me was the statement that R0 was an input into its calculation making it a feedback system. These types of dynamical systems are known to exhibit truely chaotic behaviour. Even when not operating in those chaotic regions, the numerical methods must be chosen carefully so that they themselves do not introduce artificial method-induced pseudo non-deterministic behaviour (small differences in the initial conditions or bugs such as use of uninitialised variables)
The modellers argument would be that life is chaotic, and introducing a virus to two separate but identical towns could indeed result in very different outcomes.
Which makes me wonder about the validity of modelling chaotic systems at all…
I think the practice could make sense. The input R0 might describe how communicable the disease is without countermeasures, while the output R0 is the resulting communicability with the countermeasures being modelled. Nowhere in the article does it actually say the output is used as the following run’s input, and while I agree that’d be illogical and give huge swings in outputs (e.g., perhaps converging on infinity!) there’s no sign that’s being done. Is one of the top five critiques we can make of this code, that if used in a manner it’s not being used, it’s output would go crazy?
The value of your comprehensive reply was completely invalidated when you declined to provide your own calculations!
Yeah you’ve written millions of lines of code in dozens of languages, but didn’t read the review carefully. There’s a difference between randomness you introduce, which you can reproduce with the correct seed, and bugs which give you random results. You can’t just say, ‘oh it’s stochastic’, no, it’s bug ridden. They don’t understand the behaviour of their own model.
Saying it’s crappy because it’s 30 years old is nonsense. You can’t then use your crappy, bug ridden code to influence policies which have shut the economy down.
Unix is 50 years old. And IBM mainframe operating systems even older. And CICS…
Software on which the world runs every second of the year.
The review is a code review, not a review of the mathematical model, so I don’t see that one would expect it to present the substance of the model in any detail.
” There are EU countries which have conducted tests of large random, unbiased, samples of their population to estimate what percentage of their population has had the virus. The number – in case of those countries – comes out at 2%-3%. If the same is true of the UK, then 30,000 deaths would translate to 1 million deaths if the virus infected everybody. ”
Antibody tests indicate those people who have been sufficiently susceptible to the virus for their innate immune systems and existing T-cells to be unable to defeat the SARS-COV-2 virus, resulting in their slower-responding adaptive immune systems generating antibodies against the virus. But there are potentially a much larger number of people whose innate immune systems and/or existing T-cells are able to defeat this virus, and have done so in many cases, without generation of a detectable quantity of SARS-COV-2 specific antibodies. That seems the most likely explanation for why the epidemic is waning in Sweden, indicating a Reproduction number below 1, contrary to even the 2.5% probability lower bound of 1.5 for the Reproduction number there estimated by the Imperial College team using their model (https://www.imperial.ac.uk/mrc-global-infectious-disease-analysis/covid-19/report-13-europe-npi-impact/).
It seems to me that most of your comments are excuses for practices that were poor at the time, let alone now. Most of them simply reinforce the view that the code should have been ditched and rewritten top to bottom years ago as being no longer fit for purpose, if it ever was. Opportunities or signals to do so: move from single to multi-thread machines; publication of new/revised libraries with different flavours; discovery of absence of comments (!); discovery that same input does not yield same output (when it’s intended to); etc
Incidentally, “… [no] reason to think these bugs existed in the original code or that they were material.” which is precisely why we need to see the *actual* code that produced the key reports leading to the trashing of our economy and the lockdown with its consequential deaths.
Personally, I don’t think programmers necessarily criticise old code so long as it does what it claims to do. They may not like or understand the style but they can accept that it works. But here’s the thing: if it doesn’t do what it claims, then the gloves are off and they will come gunning not only for the errors but the evident development mistakes that led to and compounded them.
Ferguson said his code was written 13 years ago, not 30. Even so, 30 years ago undocumented code was still bad practice even if that’s how some programmers worked. Unless Ferguson can provide evidence that his original code underwent stringent testing then there’s little reason to trust it. But if it was tested properly the question still remains whether the model it implements is a reliable reflection of what would happen in reality.
question what his predictions for: BSE, Swine Flu, Avian Flu in the past were compared to reality.
hint: his predictions were worse than asking Mystic Meg.
the code was written 13 years ago, not 30.
It was a different time is no basis for a defence and your comments are a defence. They either thought their code worked or they didn’t. This shows that they didn’t. That’s all that matters. As for your fear. That’s yours to deal with. Sounds like you’ve got issues to me.
Sorry, but this is an absurd criticism. We have all seen old legacy code that needs refactoring and modernization. Anything that is mission critical for a business, in medicine, in aviation, etc., will often have far more testing and scrutiny applied to it than the actual act of writing the code because either huge amounts of money are at stake, or even more importantly, lives are at stake. For this kind of modeling to be taken seriously, a serious effort should have been made to EARN credibility.
There is simply no excuse for Ferguson, his team, and Imperial College for peddling such garbage. I COMPLETELY agree with the author here that “all academic epidemiology be defunded.” There are far superior organizations that can do this work. And even better, those organizations will generate predictions that will be questioned by others because they are not hiding behind the faux credibility of academia.
dr_t,
Linux is nearly 30 years old. What’s your point again?
And Linux – although legally unencumbered – is essentially a Unix-like operating system. And Unix dates back to 1970.
Given the lead-in remarks here, I wonder if this commenter is just trolling us.
Given the fantastical view of software development 30 years ago, I wonder if he really knows that much about software development? Comment free code? 15,000 line single-source files? GMAB! Kernighan and Plauger were complaining about standard Pascal’s lack of separate compilation 40 years ago when they rewrote “Software Tools” as “Software Tools In Pascal, stating that while it might be better for teaching, that lack made it worse than even Fortran for large scale programming projects.
I have a PhD in biochemistry and currently do academic research in systems biology. I have about 20 years coding experience. This kind of approach to statistical analysis is very familiar. I concur with dr_t.
The stochasticity is a feature not a bug; it is used to empirically estimate uncertainty (i.e. error bars). The model *should* be run many times and the mean/average and variance of the outputs are exactly the correct approach. Highlighting the difference between two individual runs of a stochastic model is only outdone in incorrectness by highlighting a single run.
You’re effectively criticizing the failure to correctly implement a run guarantee that wasn’t important in the first place. Based on your description it sounds like the same instance of an RNG is shared between multiple threads. Your RNG then becomes conditioned on the load of the cores, because any alteration in the order in which the RNG is called from individual threads changes the values used. If it’s accurate that the problem persists in a single threaded environment then it could be the result of a single call to any well-intentioned RNG that used a default seed like date/time. The consequence is only that parameter values are conditional on one random sequence rather than another random sequence. It’s irrelevant in practice.
Whether, as commenter MFP puts it, “the variation in output is the product of ‘intented’ pseudo-randomness or the product of unintended variability in the underlying process” is irrelevant. Variability *is* randomness. So intended and unintended randomness is an meaningless distinction. Non-randomness masquerading as randomness is the only important consideration, and such a mistake results in *less* variation in the results, not more.
The other thing to notice is that the difference between the two runs seems to be (almost) entirely a question of “onset”. That is, the curves are shifted in time.
You’d expect a model to be far more influenced by randomness “at the start” (where individual random choices can have a big effect), and so you shouldn’t be reading very much into the onset behaviour anyhow (c.f. nearly all the charts published show “deaths since 20 deaths” or similar, because the behaviour since the *first* death has a lot of random variation). If this is what’s actually happening (and it certainly looks like it to me) the people making the critique are being fairly disingenuous not to point it out.
To be clear: I don’t think the non-reproducibility (in a single thread environment) is good, and it’s a definite PITA in an academic environment, but I’m doubting it makes any substantial difference to the results. “80,000 deaths difference” looks to be massively overstating things, when more accurate would be “the peak death rate comes a week later” (with the final number of deaths the same).
And even if 80,000 was accurate, it’s only a 20% difference. There are lots of input variables we’d be ecstatic to know about to 20% accuracy (R0, IFR, etc.), so that level of uncertainty should be expected and allowed for anyhow.
There may be other more serious flaws in the model, and I wouldn’t be surprised if some fundamental assumptions are wrong that make a much bigger difference – we are in uncharted territory. But this particular one doesn’t look to be serious.
While we can debate the reviewer’s understanding of stochasticity used in this model, there doesn’t appear to be much debate about the quality of program/model itself. Put another way, it does not matter if the correct ideas were used in the attempt to create a model if the execution was so poor that the results cannot be trusted.
As an academic, I would expect you to be appalled that the program wasn’t peer reviewed. I can only hope that your omission here does not represent a tacit understanding that such practice is customary. But I suspect such hope is misplaced.
All of the modern standards (modularization, documentation, code review, modularization, unit and regression testing, etc.) are standards because they are necessary to create a trustworthy and reliable program. This is standard practice in the private sector because when their programs don’t work, the business fails. Another difference here is that when that business fails, the program either dies with it or is reconstituted in a corrected form by another business. In an academic setting, it’s far more likely that the failure will be blamed on insufficient funding, or that more research is required, or some other excuse that escapes blame being correctly applied.
I’m not going to defend coding practices as such in the academy. Just realize that modularization, documentation, code review, etc. become much more burdensome when the objective of the code is a moving target. This is how it is in a basic research environment where the how is, by definition, not known a priori. How do you plan the programming when the solution is unspecified until the very end. The solution itself is what the research scientist is after, the implementation is just a means to that end. The code is going to carry the legacy of every bad idea and dead end that was pursued during the project.
This will always be a point of friction because once the solution is found it always looks straightforward and obvious in retrospect. A professional coder can always come in after all that toil and failure and turn their nose up at all the individual suboptimal choices scattered throughout. This happens constantly; a researcher develops a novel approach that solves 99% of the unknowns and then a hotshot software engineer comes in and complains that there’s still 1% left and if s/he had written the program (now conveniently armed with all the theory that was the real product of the research) it would run ten times as fast and account for 99.1% of the uncertainty. Come on. It’s a well-known caricature in research environments.
Go ahead, review and rewrite the Ferguson group’s code. Will the program run better? Definitely, probably a lot better. Will it be easier to understand? Yes. Will the outputs be exactly the same? No. Will they differ to such an extent that the downstream political decisions fundamentally change? *Very, very unlikely.*
Look, you want your opinions to have merit, then carry the burden. That’s what he rest of us have to do. Moreover, it’s very, very likely that much of the code could be modularized for reuse and that the tweaking can be done systematically in a subset of modules.
What you’re describing is akin to an actual scientist puttering around in a lab and then telling the world they have found the solution while at the same telling the world it’s too complicated to explain or document along the way, so just trust the results. Just another reason why this process fails the basic principles of the scientific method.
I was programming point of sale and some financial software about 40 years ago so I agree with your point that it was very different – a few K of RAM and a few years later a massive 10 megabyte hard drive!
However Stochastic still equals random and we can’t do what we’ve done on random information.
Good luck with hiding from a Caronovirus! It was right across the UK weeks before lockdown and will, in my view, be asymptomatic in between 30 to 60% of population. My guess is as good as any guesswork produced by predictive, stochastic models!
“I made my own calculation at the end of January, estimating the likely mortality rate of this virus. I’m not going to tell you the number”.
So, in other words, you are just like Ferguson: You made a prediction, which might have been reasonable at the time, but you won’t show your workings (and you won’t even tell us the prediction) but now you’re going to stick with it no matter what. That’s terrible science.
The latest meta analysis of Sero studies:
https://docs.google.com/spreadsheets/d/1zC3kW1sMu0sjnT_vP1sh4zL0tF6fIHbA6fcG5RQdqSc/
shows an overall IFR in the region of 0.2%, higher in major population centres. For people under 65 with no underlying health conditions it’s more like 0.02%. Research from the well-respected Drosten in Germany suggests perhaps 1/3 of people have natural immunity anyway:
https://www.medrxiv.org/content/10.1101/2020.04.17.20061440v1
Did you factor this in?
If your estimate is different to this, it’s looking increasingly likely that your estimate was wrong. Have you back-casted your estimate, perhaps using Sweden or Belarus as references?
Well said, Dr_t!!! Exactly my sentiments – from someone who started FORTRAN modelling 50 years ago and has continued through today.
I would describe this as simplistic and superficial critique – not really adding anything material to the discussion.
For those who don’t agree with a stochastic modelling approach, tell me from where you have “typical lock down behavioural patterns” for a truly probabilistic model. Nonsense!!!
Go back to the drawing board and come up with some useful and materially significant comments.
30 years ago I was developing the Mach operating system (the thing that runs Apple computers today). Written in ‘C” I can assure you that it was multi-threaded, modularized, structured and documented. Multi-cpu computers were already commonplace if not on the desktop. Dining philosophers dates from 1965 and every computer scientist should have come across that at university for the last 50 years. Multithreading has been available to coders since at least the days of Java (1995) if not before (it doesn’t require a cpu with more than 1 core just language and/or OS support).
I went to university in 1988, and one of the 1st year modules was concurrent programming. We used a language called Concurrent Euclid (a pascal clone with threading) possibly because threads weren’t well supported or were awkward to use and understand in other languages. Multi-threading programming in mainstream systems has been around for a long while.
Indeed and I remember Modula 2, another Pascal derivative, supported threads. Concurrent programming is pretty old hat really.
Your stupid “””model””” clearly failed to take into account asymptomatic cases (between 60 and 80%). Maybe you ought to look at Iceland since they’ve done testing on 100% of their population, albeit still using low-specificity tests. Say, how come during the same time period in the US, 10% of the population contracted influenza but only 0.3% contracted COVID-19? I thought COVID-19’s R0 was many times higher than the influenza viruses..? Pro tip: infections are WIDELY underestimated, meaning CFR is widely overestimated.
Completely agree.
It should also be noted that this ‘bug’ has been fixed – https://github.com/mrc-ide/covid-sim/pull/121
The very fact that the model code was USED now correctly lays it open to review and criticism, the same as if it were written yesterday, particularly as it has a direct affect on the wellbeing of millions NOW. If it’s not fit for purpose, it doesn’t matter how old or new it is.
Ferguson wrote this on his Twitter account a few months back: “I wrote the code (thousands of lines of undocumented C) 13+ years ago to model flu pandemics.”
So it more like 13 years old – not 30 years old.
“30 years ago, there was no multi-threading, so it was reasonable to write programs on the assumption that they were going to run on a single-threaded CPU. ”
Well yes. I am involved in a big upgrade to academic software to multithreading for the same reason. But we are extensively testing and validating this before even considering using it. Sounds like Ferguson’s group did this, found differences that indicated the single threaded code had wrong behaviour and then ignored it. So the problem is not lack of multi-threading, its lack of good testing and responsible behaviour (not using code you know is dangerously wrong)?
Very interesting. I know nothing about the coding aspects, but have long harboured suspicions about Professor Ferguson and his work. The discrepancies between his projections and what is actually observed (and he has modelled many epidemics) is beyond surreal! He was the shadowy figure, incidentally, advising the Govt. on foot and mouth in 2001, research which was described as ‘seriously flawed’, and which decimated the farming industry, via a quite disproportionate and unnecessary cull of animals.
I agree with the author that theoretical biologists should not be giving advice to the Govt. on these incredibly important issues at all! Let alone treated as ‘experts’ whose advice must be followed unquestioningly. I don’t know what the Govt. was thinking of. All this needs to come out in a review later, and, in my view, Ferguson needs to shoulder a large part of the blame if his advice is found to have done criminal damage to our country and our economy. This whole business has been handled very badly, not just by the UK but everyone, with the honourable exception of Sweden.
Thanks for your words of wisdom (I truly think they are). Nevertheless, for me (if true) the main point of the critique is: same input -> different output, under ceteris paribus conditions. Best regards and luck in your lockdown.
None of what you say excuses the use to which this farrago of nonsense has been put.
I’m not sure that the code we can see deserves much detailed analysis, since it is NOT what Ferguson ran. It has been munged by theoretically expert programmers and yet it STILL has horrific problems.
I don’t know how you code, but I’ll stand by my software from 40 years ago, because I’m not an idiot and never was. Now … where did I put that Tektronix 4014 tape?
In my field, economics, 61-year-olds like me face the problem that the tools are different from what they were 30 years ago, but we old guys can’t use that as an excuse. To get published, you have to use up-to-date statistical techniques. It’s hard to teach an old dog new tricks, so most of us stop publishing.
Your point that 30 years ago, programs didn’t have to cope with multiple cores sounds legit— but the post above seems to be saying that’s not the main problem, and it wouldn’t even work if run slowly on one core.
The biggest problem, though, is not making the code public. I’m amazed at how in so many fields it’s considered okay to keep your data and code secret. That’s totally unscholarly, and makes the results uncheckable.
“On a personal level I’d actually go further and suggest that all academic epidemiology be defunded. This sort of work is best done by the insurance sector. Insurers employ modellers and data scientists, but also employ managers whose job is to decide whether a model is accurate enough for real world usage and professional software engineers to ensure model software is properly tested, understandable and so on. Academic efforts don’t have these people and the results speak for themselves.”
Perhaps even more significantly, they pay a price when they get it wrong, a check on overreaching idiocy that appears completely lacking in these “advisory” academic roles in government.
See also https://www.youtube.com/watch?v=Dn_XEDPIeU8&t=593s Nassim Nicholas Taleb on having Skin In the Game.
On Monday I got so angry that I created a change.org petition on this very subject.
https://www.change.org/p/never-again-the-uk-s-response-to-covid-19
It sounds like something an undergrad would knock together, but this team is supposed to be the cream of their profession.
If this is the best the best can do then to ‘suggest that all academic epidemiology be defunded’ sounds like a good plan to me. But, sadly, this is shutting the stable door after the horse has bolted.
and exceedingly well funded (by Gates and others). No excuses at all for old or poor code.
non whatsoever.
“Gates”, “poor code”! Now where have I seen that before?
We should not assume that the cream of the academic crop knows how to develop industrial strength software, or at least we’ll written code. Proper software development techniques are usually NOT taught in academia.
Thank you. Are the mainstream media capable of covering this? That is what frightens me.
Who is going to be the first to point out that the reason sick peoples weren’t getting hospital beds is because the models were telling us to expect thousands more sick people than there were? How many people died because of this?
And what about all this new normal talk? All these assumptions life will change for ever built on fantastic predictions which are being falsified by Swedish and Dutch data?
This diktat that we can’t set free young people who are not threatened by the virus because the model says hundreds of thousands would die? All nonsense.
This is the greatest academic scandal in our history.
‘Are the mainstream media capable of covering this?’ Let me think………. ‘No’.
They are certainly capable, but is it in their interests to? Not until the wave turns and is racing back towards them to swamp their current rhetoric. Then they’ll go into self-preservation mode, and make you believe they were asking this all along.
Slightly off topic but I would suggest that some of the climate science work suffers from similar problems and at a comparable scale. Dr Mann’s flawed hockey stick comes to mind; my understanding is that the analysis code has never been released.
I am science trained but a HW guy, not SW. I place most of my trust in measurements, especially ones that can be reproduced by others.
“I would suggest that some of the climate science work suffers from similar problems”
The infamous “Harry_Read_Me” file contained in the original Climate Gate release springs to mind. As I recall, it was a similar tale of a technician desperately trying to make sense of terrible software & coding being used by the “Climate Scientists” – one of whom had to ask for help using Excel…
Currently in court charging defamation but needs to provide disclosure (his ‘code’) and is kind of having cold feet – so proceedings drag on
MUCH more politics in Climate Change! You are simply not allowed to question the basic assumptions..
Er… “much more politics” than the model that has been used to shut down most of the world?
…the assumptions – built into MODELS!!
Any virus has an inherent R0 for a constant set of conditions (input R0). It also has an effective average R0 in the population for the given social conditions (output R0). Hence this explains why R0 appears as input and output.
I would call it
R0 inherent (input)
R0 effective (output)
How do you arrive at the R0 that you feed in?
R0 is a number that is calculated from other model parameters (contact rates, migration rates, recovery rates, death rates, etc.). It has no value to be fed in. The model parameters are fed in, and R0 is some function of these parameter values. Also, too much emphasis is placed on this mystical “R0”, as if an entire epidemic is controlled by one number. This is plainly ridiculous.
Mathematical models of epidemics are just simplified representations of reality. One can fit them to data, once one has data, and the fit may be impressive. But as a predictive tool, in the absence of much data, they may be useless. Having experience of mathematical modelling of epidemics, and knowing their limitations, it is bewildering how countries around the world have imposed all these silly lockdown measures, seemingly because of one computer program by someone who isn’t even a mathematician or programmer.
It seems to me that politicians, afraid of appearing ignorant when their academic “professor” buddy told them everyone was gonna die, and being pressured by an increasingly hysterical media reporting every individual case of coronavirus, decided lockdown was a good idea. Of course, as time will tell, it was never a good idea. And it won’t be in future either, if another new disease comes along.
Perhaps I should re-phrase it as “How do you arrive at the parameters you feed in?”. Ferguson speaks of trying different R0 values over a specified range, so presumably his model does have some notion of R0 as an input. However, it could be that he simply sweeps the model with different transmission parameters and observes which one effectively produces the R0 that he wants.
Either way, he is starting with a range of values of R0 that has been obtained from somewhere. Maybe it’s just that “everyone knows that SARS-Cov-2’s R0 is about 2.5”. But where did that come from?
I think it comes from fitting a ‘model’ (maybe just an exponential formula) to real data (typically at the start of the epidemic – although that’s an assumption in itself) and adjusting its R0 for best fit. As others have observed, if the early data all comes from hospitals, and is affected by arbitrary factors like availability of tests, choice of subjects etc., then that R0 is already very ‘wrong’.
Your last paragraph Mr Cabbage is spot on. We seem to have employed and accepted flawed evidence which inevitably leads to the wrong conclusion. In a scenario such as this is too important to indulge in such activities.
That’s not a conclusion, that’s a recommendation.
Agree that a proper epidemiology model that is robust and peer reviewed is required and should be a good outcome from this pandemic.
As someone who has worked in the areas of Software Maintenance, Legacy Systems, and Software Testing, and has taught Computer Science to MSc level, I have to say I am appalled. A Computer Science student could do much better than this. Why is Prof Ferguson still being employed by the once prestigious Imperial College?
“Why is Prof Ferguson still being employed by the once prestigious Imperial College?”
About that….
https://www.bbc.com/news/amp/uk-politics-52553229
They could write better code, yeah. But they wouldn’t understand the epidemiology bit. Brogrammers…
But he does. It’s called working in a team.
Interesting, I downloaded what purported to be the Imperial Model software from github and it was in Python. Full of hard coded numbers seemingly pulled from the ether. Didn’t see any C++ in there
I think the original code was written in C++ (Ferguson said C) but I read somewhere that it was ported to R and Python recently – presumably that was the work done by Microsoft.
Although there is R in this, that’s for analysis and display. The one being discussed here is in C++. There is *another* model from Imperial College (the one for “Report 13”) that’s essentially the implementation of an analytical model, and that uses Stan, Python (to set it up) and R (for analysis and display). That’s not the one described here, which is elsewhere on github.
Oh good, it will be fine if Microsoft have done it.
Yes, my reaction precisely. Thank goodness no Microsoft products have ever had any bugs.
The .cpp files are in the src directory: https://github.com/mrc-ide/covid-sim/tree/master/src
This is stunning in how awful this all is. The word criminal comes to mind. Thank you so much for this assessment.
No do the same with climate change models
“Clearly the documentation wants us to think that given a starting seed, the model will always produce the same results.”
No! That’s not how stochastic simulations work! Or indeed the real world! In biological systems we literally *expect* a range outcomes given the same input. You run the model repeatedly, and then report an average, an the 95th percentiles of the results.
You absolutely, 100%, Do NOT want a model that gives the same results given the same inputs.
You might be software engineer, but you’re no biologist.
Explain more please: is this entire critique invalid?
Try reading more carefully. She said, given the same starting seed you get the same results. That’s exactly how a stochastic simulation is supposed to work. If you don’t you’ve introduced a bug, I mean ‘non-determinism’
That’s not what the review says. There was a brief period while there was a bug, but outside that period – both the original code, and the code once the bug was corrected – did not exhibit such behaviour in the execution environment the program was written for – a single threaded single CPU computer.
Name one piece of software under active development with a regular release schedule where every intermediate release is bug-free.
Trying to use monolithic 15000 line 30 year old (apparently) C code which is a result of auto-translation from Fortran on a multi-threaded multi-core computer or trying to get it to work in such a setup is a fool’s errand. Use the code on a single-threaded CPU, for which it was designed and judge it on its performance in such an environment. To be used in a multi-threaded setup, the software would have to be rewritten from scratch using the proper tools for parallel computing (hint: it isn’t tools using shared memory and mutexes).
I’m sorry, but I really am not persuaded by these criticisms which appear to be very superficial. What is the substance of the model? Is it wrong? Why is it wrong? Is it innovative and clever? Why is it innovative and clever?
This is what the reviewer should be asking, and is what would be interesting to know.
Unfortunately, I doubt the reviewer has the necessary skill set to evaluate the model from this – the proper – perspective.
I really don’t care about Ferguson’s coding style. He is not a professional programmer, he is an epidemiological mathematical modeler. How well he does this job is the relevant question, not his coding skills. Especially not his coding skills 30 years ago judged from a modern day perspective.
As an academic, I doubt he has an army of professional programmers working for him.
I’ve seen plenty of brilliant engineers who write programs to help them with their designs. Needless to say, their code is invariably a rat’s nest, but if it allows them to do their job well. Is it appropriate for professional programmers to try to judge those whose job is something else and who use programming only as a tool and try to show off their superior coding ability? I’d like to see such programmers do the engineer’s or the modeler’s job and see how they fare. Then we can all stand there and mock their efforts.
Absolutely spot on dr_t. You clearly know your business. I’m a bit shocked by the critique as well. As a stochastic modelling expert who has written many a ‘rat’s nest’, it is obvious to me that the seed bug which she makes a meal of is not an issue at all for this particular code as it depends on an ensemble of results. Of course, it’s nice to fix it to have reproducibiity of individual runs as that may confuse novice users, but from the perspective of the end result, it changes nothing.
By the way, my rats nest doesn’t stay that way. I work with a team of great developers who are pretty mediocre modellers. We work together to produce something that can be consumed by a fairly large body of non-expert users, but if it’s usage stayed with a handful of experts, we could save the expense of the refactoring and the glamorous user interface.
Absolutely agree with dr_t and earthflattener! You are both spot on with your criticism of the author of this misleading and erroneous report. There seem to be two camps forming here – the “IT geeks” focussed on the purity of code and the true “modellers” who are interested in concepts and theories, which is why they become modellers in the first place.
It’s like when learning about a water molecule being H2O – one oxygen with two hydrogens around it – a truly simple “model” of a water atom. Is this technically correct? Of course, NOT! Is it adequate to explain what’s happening without too many – significant – detrimental effects? Of course, YES! There’s no mention of neutrinos or other particles but it doesn’t invalidate the basic model – H2O – and how it’s applied.
Get a grip all you geeks! You just cannot see the wood for the trees – the author of the critical analysis should get some experience in “modelling” before writing critical commentaries. So here’s an exercise for all IT geeks. Do your personal budget for the next 3 years – forecast your income and expenditure. Let’s see if you can figure out your travel and holiday plans. That’s a very simple “financial model”. Do you use Excel or the “back of an envelope”. Does it matter which? NOT AT ALL!
What matters are the ASSUMPTIONS – which have no bearing on whether you use Excel or the envelope.
So easy to see through those who think “code” versus others who think “concepts”! The more I read the author’s article, the more manifest it becomes that she can only think “code” and has no clue about complex scientific, behavioural or economic concepts.
Sad!!!
Isnt the model generated on code, and of the code is wrong the model is wrong?
First, it’s a strawman argument to suggest that professional software developers expect some kind of code purity. Second, when you refer to professionals as “IT geeks”, you are attempting to undermine their professional credibility without addressing their merits of their concerns. It’s just banal rhetoric.
Expecting well organized and documented code is not an expectation of purity. It’s a best practice so that when bugs are discovered, they will be far easier to track down when the code is orderly and the programmer’s intentions are documented. Every professional “IT geek” understands this.
Look, we all write proof of concept routines when we’re experimenting with different ideas. Novice programmers tend to get so wrapped up with their project that they don’t take the time to rewrite their doodling into something more orderly and reusable. Experienced programmers learn from those mistakes.
Lastly, in case you haven’t noticed, the world runs on quality software. We literally trust our lives to it whenever we fly on a plane, for example. I can’t say the same for much of what the geeks in academia generate.
The point is, you want some way to know whether the code has bugs — whether the model is doing what it was written to do. If it’s poorly documented and untested, doesn’t reproduce itself (to some level of consistency) you can’t really.
That’s why this is a problem. Most well written systems using stochastics use pseudorandom numbers, that look random, but are fixed based on the ‘seed’ to the random number generator. With the same seed, they give the same results.
This didn’t, which is a sign that something broken is going on. With C++ that can be a lot of things. One of the most obvious is using an uninitialized variable. E.g. you are summing numbers, but you forget to set it to zero at the beginning. Often it will be zero, but sometimes it won’t be. This introduces a bug, and non-determinism, and means your results generally can’t be trusted.
There are actually a lot of good static analysis tools for C++ — I’d love to see them applied to this code base.
the only concept that really matters is results.
Have Dr Ferguson’s results been valid in the past?
Were his predictions for BSE deaths, avian flu deaths and swine flu deaths using this modelling software borne out as roughly correct when looking back at the actuality?
The answer is that Mystic Meg would have done a better job than his software.
“How well he does this job is the relevant question, not his coding skills. Especially not his coding skills 30 years ago judged from a modern day perspective.”
respectfully, if one’s brilliant mathematical modelling skills are encoded in ways that undermine the ability to produce coherent/consistent and applicable results based on that model’s logic & assumptions, then of what use is that brilliance? a crap translation of the illiad or shakespeare destroys its power…that’s what crap software instantiations do to ‘complex’ mathematical models….i would alos add the obvious: if the imperial model had proved even remotely accurate/consistent, it would not be undergoing this level of (literal) disbelief and scrutiny….i’m sorry but both your criticism and critique is unfail & inaccurate
Another person who doesn’t read carefully enough. I was responding to the person who claimed, incorrectly, that this is not how stochastic simulation works.
Furthermore, he doesn’t need an army of professional programmers working for him, but he does need people with professional programming skills who can adhere to standard practices, This guys model has been the motivation for shutting down the entire UK economy.
Who cares what you’ve seen engineers hack together? The complaint is not about aesthetics, it’s about correctness, reproducibility, transparency. If your model stays in your research group then who cares? But if it’s used for something this important you don’t get to say ‘I’d like to see a programmer do MY job’.
You cannot seperate the model in this case from the implementation of it. Besides all that, you’re still wrong. If you read the issue tracker, they thought the issue would be resolved by running it on a CPU, but it wasn’t. That’s why the reviewer pointed out they don’t understand how their code is behaving.
If you are running under Windows, then I question whether ANYONE understands what their code is doing in detail. Even with ‘Hello World!”…
dr_t,
I am a programmer, mathematician, and mathematical biologist. Regardless of whether Neil Ferguson’s program bears any relation to reality or not, I just want you to know that your tone in your messages is very rude and is totally unacceptable. Your arrogance completely undermines your credibility. Put your face mask on and stop talking.
The proof is in the reality pudding as they say.
Here are the results of Professor Ferguson’s previous modelling efforts.
Bird Flu “200m globally” – Actual 282
Swine flu “65,000 UK” – Actual 457
Mad Cow “50-50,000 UK” – Actual 177
You do protest a bit much.
“I really don’t care about Ferguson’s coding style. He is not a professional programmer, he is an epidemiological mathematical modeler. How well he does this job is the relevant question, not his coding skills. ”
This is a strange defence. If the code doesn’t implement the model correctly then his coding skills, more specifically his software engineering skills, is highly relevant. If the code hasn’t been validated through rigorous testing and contains bugs then it’s worse than having no model at all.
Just because spaghetti code exists doesn’t mean it’s the norm in a professional development environment. The bottom line is that if the recommendations from a computer program are going to be used to make decisions that significantly affect the daily lives of millions of people, the friggen program absolutely needs to be as solid as possible, which includes frequent code review, proper documentation, and in-depth testing. Then, it needs to be shared for peer review.
I disagree. A stochastic model is simply a deterministic model that has inputs that are generated randomly.
For example, let’s say I run a stochastic simulation of a random walk. I build a deterministic model that says if my input number is less than 0.5 then take a step left, else take a step right. I then use a random number generator that gives a me a number between 0 and 1. If I generate 5 numbers and they are all less than 0.5, then the person should have taken 5 steps left. If my output says they are anywhere else other than 5 steps left, then my model is broken and running it multiple times and averaging it doesn’t fix the issue.
For example, let’s say my model actually says take a step left if the number is less than 0.5 AND if Neil Ferguson is horny. Unless Neil is horny every second, then my output will be wrong. If Neil is horny only 1/2 of the time, then the random walk will be too far right. Averaging the outputs will not fix that error.
And I work in insurance modeling. The comment about insurance modeling is a bit too kind! Better than what Neil is giving us though.
I don’t think you understood the bug she was saying was a disaster (the only one she really made a deal of … though in my long comment above, I deal with all her points). The error is not systematic. It is simply that saved state incorrectly codes a seed, so that using your random walk example, If you ran it once you obviously started with a seed that generates your vector of random numbers. If you now run it again with the same seed, you will get the same answer. Lose that seed, then running it gives a different answer, but a perfectly legitimate random walk. Since what you are interested in is the average of a large number of walks, then the result is the same whether there was a bug in the seed saving or not…..becuse in both cases what you need (and get) is a large number of independent runs. The seed saving issue does not compromise the independence of the runs…so no problem!
Neal,
Bob’s response to your comment is right. Your comment shows that you do not understand how programming works. Also, given exactly the same inputs, I am afraid that you would expect exactly the same ouputs, even in biology.
In biological systems, there are so many variables that one can never know all the “input” values. If you measure only a few things in an experiment and get “random” results, this doesn’t prove that life is inherently random. It just shows that you haven’t measured everything.
At a quantum level, there may be an inherent randomess in some processes, but not for computer programs, and not on the macroscale of life.
Your simulation has to produce identical results with an identical seed, otherwise it would be impossible to test for correct output. You can use random numbers as the seeds, and run many simulations to model reality, and then see what you get. But if two runs using the same seed produce different output, that’s not a good sign.
They don’t. There was a problem with saving the seed, so in fact when you thought you were using the same seed, in fact you were not. For the rest, then running many runs gives the same answer, whether you succesfully saved state between runs or not.
You are confusing your model with code.
Computer code idempotent and deterministic. That is for a given set of inputs it will produce the same results. If this were not so then computers would be pretty useless. Mathematically this makes sense as a computer processes binary data with a limited range of mathematical and boolean operators and branch routines.
Now you want your model to be non deterministic. You do that by introducing some randomness but that has to be under your control not via some bug in the code or race hazard or timing event between the thread scheduler or some such. You want to be able to actually test the model under controlled circumstances and this clearly wasn’t possible with the code Fergusson wrote.
Getting randomness in computer systems is actually pretty hard and an area of study in itself.
The critic of Fergusson’s code appears to be valid. I don’t think you read it properly or you didn’t understand what was said. You may be a biologist but you ain’t no computer scientist.
I am a lay person who does not understand computer modelling….but for such huge decisions to be made without adequate peer review of the data is shocking.
Code should always be unit tested by someone other than the developer. Rule number one.
Many thanks indeed for putting in the time to review the code and to write your informative review.
Thanks for this article – I wrote C code solidly for 5 years – and still do bits and bobs. It does not surprise me one bit – because I knew this was a scam more or less from the “get go” – due to the investigation I did into the “Swine Flu” affair back in 2009. It’s not about public health – it’s about control – and selling pharmaceuticals (tamiflu and vaccines in 2009 and vaccines now). See my report at https://cvpandemicinvestigation.com/ if interested.
So the problem is one of computational mathematics, rather than that of software development, which most programmers do. Thankfully i have some learning in computational mathematics. Unfortunately, the vast vast majority of computational mathematicians are programming amateurs. Maths is the end goal, and programming is just a means to get there. As a result, their software is crap. Ergo, most of the critique is spot on. HOWEVER… the excuses imperial gave are valid. Stochastic mathematics does not require exact and reproducible results. Its aim is to simulate trends and patterns. Therefore the key thing to critique is the model. The author has failed to do this, so the entire article is bluster. The author’s background is in database programming, not mathematics, so i wouldn’t be surprised if the subject was outside her area of expertise.
While Stochastic mathematics may not require reproducible results, software which simulates such models does (in order to prove that the software works as expected and that bugs are not introduced). This is why in modelling programs the stochastic model is given a seed to initiate the randomness of the model.
During development we can use the same seeds repeatedly to ensure we haven’t ‘broken’ the model by introducing bugs which cause unexpected outputs.
For production use we use as many seeds as desired to repeatedly run the model (introducing the the required level of randomness) before averaging results.
This should be a national scandal but the media will probably make little, if anything, of it. We live in a world of staggering absurdity that the govt has been consulting Ferguson on pandemic issues given his poor track record of predictions and inadequate software engineering practices.
On a personal level I suggest that your personal agenda shines through this absolute pile. You should be ashamed of yourself. But hey, you designed a company’s database product, so you must know what you’re talking about. Wow, a complex system has flaws and bugs. O.M.G.
Embarrassing. I almost pity you.
Good article.
What’s troubling about this is nothing that came from this code could possibly have been peer reviewed in a meaningful way.
The code should have been released and written with the right abstractions so the bits of interest to epidemiologists are hundreds of lines of high level Python rather than thousands of lines of C/C++. Then other academics could have tinkered and built on it.
I was expecting this model to be bad but this is an order of magnitude (at least) worse than I’d expected.
And the team has then developed a culture dangerously tolerant of mistakes. For instance, if the model isn’t deterministic giving consistently the same answer for the same inputs, I think it’s wrong or perverse to average the outputs over multiple runs because you don’t know whether your averaging outputs resulting from your methodology or from bugs in the program. It’s bizarre.
As soon as they started getting or being told about non-deterministic results they should have put it on hold until the issue was eliminated.
All this reflects very badly on whoever was funding them; on the journals publishing their results and on the peer review process (if any) in the field of epidemiology and mathematical biology.
As a remark in passing, I suspect that similar critiques could be made of the climate change models if anyone could get them into the public domain.
Similar critiques have been made. Remember the East Anglia stuff?
I agree with you. The peer review process should involve scrutiny of publically available computer code, especially when the work is funded by the tax payer.
Much of the “science” that is published by journals is nonsense. Ionnadis wrote an interesting paper in 2005 called “Why Most Published Research Findings Are False”.
I’m glad to see there has been some critical analysis of the code behind the modelling. I am a hydraulic modeller myself and ultimately your model is only as good as the data you use and the assumptions you make (garbage in garbage out). However I fundamentally disagree with the author’s final point, that modelling is best undertaken by the insurance sector, a sector which has dubious interests at best. At least in my sector, the quality of modelling undertaken by insurers is largely considered to be a ‘black box’, and it times has thrown up interesting results. All the more interesting when you consider that insurers stand to make a huge profit off the results of the modelling. To go the same way with epidemiology would be a mistake in my opinion.
Hi there,
Have you taken a look at this repo as well: https://github.com/ImperialCollegeLondon/covid19model/
They seem to produce a lot of junk code.
He, had an excuse, he was too busy shagging.
I believe he deliberately got caught to give him a way to jump ship before this came out.
“Note the phrasing here – Imperial know their code has such bugs”
You quote Matthew Gretton-Dann but isn’t he just a software engineer working on this refactoring, not someone from Imperial ?
@Sue is there any way to reach you via email? There’s a few points I’d like to discuss.
Publicly I’d like to point out that it’s not actually “Imperial College” staff that was replying to concerns raised in those GitHub tickets, but in many cases it’s one Matthew Gretton-Dann.
Mr Gretton-Dann is not a faculty member of Imperial College. In fact, he is an employee of GitHub, which in turn is owned 100% by Microsoft. He joined GitHub/Microsoft late last year.
Neil Ferguson pointed out on March 22nd that GitHub/Microsoft would be taking over salvaging his code:
https://twitter.com/neil_ferguson/status/1241835456947519492
Isn’t this curious? GitHub or Microsoft wouldn’t be an obvious partner for such an undertaking, would they?
Wouldn’t it be mighty interesting to find out why GitHub, which is home to hundreds of very experienced software engineers who’d immediately realize that Ferguson’s code does not actually do what it claims it does, why GitHub would invest resources to give more credibility to this project? Or why Microsoft would?
Thanks for that Phil.
I’m a software developer and this report rang true to me on so many levels.
With regression testing you CANNOT write tests to retrospectively fit the model because you always assume the correct pathway is a successful result and mark it as correct. This is flawed logic!
Proper test driven development requires that you write an assert to say something is expected then write the code required to acheive it. Then when you run all the asserts in sequence you get a thorough test of the pathways and any errors that have crept in during further development are highlighted, allowing you to fix them before release, and at the end you have a very robust piece of software, where you can remove the assert tests and release as ‘tested’.
In my 30 year career, I’ve never released a piece of code that didn’t come up to my own personal quality standard. I’m not a scientist, but you would expect that the amateur coders involved would have a certain sense of pride in getting these ‘models’ functioning correctly to begin with. 15 year old code and “we didn’t have time to fix it” is NOT an argument when the model is used to predict real life/death outcomes.
I was horrified that they were unable to fix the bugs and that the results were different on different cpu’s. Their argument of taking a mean average is valid, but valid only over many hundreds if the results are as random as they suggest, I suspect that they ran it no more than 25 times and took that average. It’s shonky to the core. Simply put, I would have been sacked if I had written this software. And any developer worth their salt will tell you the same. Garbage in garbage out. If its not done properly, don’t bother.
Thank you for this report. All very well but what about every other country that applied a lockdown. Surely they didn’t all base theirs on Ferguson and ICL?
Yes they probably did. Neil Ferguson advised the WHO that advised World Leaders on how to respond to a virus threat that apparently Neil Ferguson stated would kill millions.
Considering that most of them went into lockdown while the UK was still overtly acting as though it was in denial while apparently surreptitiously pursuing the herd immunity policy from the start, and before Neil Ferguson ever spoke in public, this has to be the inevitable conclusion.
Fergusson’s estimates were known to rest of the world well before most countries took significant action with the exception of Taiwan and maybe Singapore.
I’m in New Zealand and as a layman I had read about his predictions well before we did anything except halt flights from China (and then it was really only a partial stop)
No, the country I’m in did indeed use Ferguson’s report as a justification for the most heavy handed lockdown system in the world.
It was revealed that Drs Faci and Birx used the Imperial College modelling to help persuade President Trump to close the US down.
https://www.nytimes.com/2020/03/16/us/coronavirus-fatality-rate-white-house.html
Hmmm…being the devil’s advocate here…
If Ferguson or ICL had not issued any report, would not our world leaders or media have found something similar to promote their aims? They were already in a panic over the original outbreak in Wuhan.
Are ICL the only people the western world, except Sweden, listen to?
How does a single professor have that much power? So sad.
I was saying something similar the other day – before he resigned. Ferguson has seemingly put himself in the position of Robert Oppenheimer: infamous as the inventor of a ‘device’ that may kill millions.
Possibly even worse than that: he has emphasised his own influence over the politicians in various interviews rather than just being happy to remain a scientific adviser.
Before Staat-gate he was already known in the US as well as over here, but now everyone knows about him.
If the government need someone to blame, he’s already volunteered.
There’s a slim chance history will (mistakenly) record him as a hero, but I wouldn’t bet on it.
But good link, thank you!
That’s a broad brush. “A lockdown” in the UK is different from “a lockdown” in Sweden, or Denmark, or Argentina, or India.
The universal Source of Truth for many countries in March indeed was not the fraudulent Imperial College study but the fraudulent map from Johns Hopkins which, as it happens, did not track confirmed cases of COVID-19 in real-time but actually uses a Python application to extrapolate and exaggerate public data to instill fear that the virus were spreading faster than it actually did.
Slightly different version of fraud.
A hysterical media reporting every individual case of coronavirus put huge pressure on people in positions of authority. Then, once one authority started imposing restrictions on travel, it became a competition between authorities to impose stronger and stronger restrictions. This process culminated in governments placing their countries under house arrest, which is the most absurd overreaction possible.
Imagine if every individual case of flu was hysterically reported in detail by the media…
Who actually is the author? They don’t appear to exist on the internet which seems very suspicious to me.
Sue Denim = Pseu Donym
Very well explained & agree, insurers should have funds to model as sadly, universities are not accountable & should not receive public funding if they are not transparent as most people are smart & using jargon to fool them will not work, eventually it will come out.
Ferguson’s model used an infection fatality rate (IFR) of 0.9%[1], whilst more recent data suggests that the IFR should be within the range 0.1%–0.41%[2]. According to my attempt at a cost–benefit analysis[3], this made the difference between justifying a lockdown and not justifying a lockdown.
[1] https://www.imperial.ac.uk/media/imperial-college/medicine/sph/ide/gida-fellowships/Imperial-College-COVID19-NPI-modelling-16-03-2020.pdf
[2] https://www.cebm.net/covid-19/global-covid-19-case-fatality-rates/
[3] https://medium.com/@martinvsewell/the-coronavirus-lockdown-in-the-united-kingdom-a-cost-benefit-analysis-acd19f635dae
Interesting method Martin. I’m not sure it’s the most compelling way to do it though. Reducing everything to £ by putting a value of £221k on a life saved by lockdown and comparing that against an estimated economic loss of £2.4bn a day caused by lockdown is fair enough but it’s fraught with difficulties and puts you in the ‘all you care about is £’ camp. That’s not true but …I think it’s better to stick to assessing lockdown in terms of death, suffering and hardship. Suffering and hardship are difficult to quantify, so you could just stick with deaths to make the case.
First, I would argue that lockdown doesn’t reduce Covid deaths. Assuming there won’t be a life-saving vaccine any time soon, or any life saving treatment, and that the NHS will have the capacity to cope, then the number of Covid deaths will be the same with or without lockdown. Lockdown just affects when the deaths occur. However, lockdown causes non-Covid deaths, for all the reasons we are familiar with. We don’t know how many but you don’t have to get into arguments about the number – it’s more than 1. QED.
I can’t see how it’s possible to argue with that. You don’t need to be a mathematical biologist to work it out, it’s not a matter of judgement. It’s just bleedin’ obvious. So much so that you could reasonably argue that the Government has wilfully and knowingly inflicted harm on its citizens by continuing its policy of lockdown, once the danger of the NHS being overwhelmed had passed.
Wow this is appalling. This to me is far, far worse than Ferguson breaking the rules he’s endorsed, bad as that is. I want this all over the papers. Why isn’t it????
They shouldn’t even be attempting to repair this code. They should be starting from the *model* *itself* and implementing *it*, not trying to repair a broken implementation.
Which boils down to the fundamental problem with all this. It’s not an issue of “computer” modelling, it’s an issue of modelling. What’s the actual model they are using? Not what’s the flawed implementation of the model?
I think code used to drive government policy should, at least, be testable. This code is clearly not. I’m not demanding beautiful or even clean code (as a top programmer would write). I just expect a minimal level of testability. I expect the code should be designed with testability in mind. All top programmers agree that testability is a critical requirement for well engineered code, as used in finance, medicine and critical engineering applications. All code upon which lives, infrastructure and money depends. For example: with testable code, all random numbers must be entered as parameters so that specific functions will be deterministic. Without testability once cannot know the code is deterministic. Otherwise how can one know the code models what the designer thought s/he was doing? My own experience of code reviewing shows only about 5% of coders write tests and testable code! So I’m not surprised that a non-professional like Ferguson would write the kind of code described by Sue.
PS: I’ve not read Ferguson’s code so I’m taking this on trust from Sue.
I notice that the author didn’t talk about experience with stochastic models, and I’m afraid I think it shows. I would agree with dr_t’s comment but choose to make some additional comments on the specific points that she makes.
First, on the non-determinism of issue 116. The author of the criticism doesn’t appear to understand that this is an ensemble method. You do not take the result of a single run as being ‘The Result’. Rather you run the result thousands of times and take some statistics from the ensemble of results. The mean, for example, might provide the best estimate of number of deaths for a given input set of parameters. The variance of the ensemble gives a measure of the uncertainty. The discussion on issue 116 clearly shows that what was wrong was that there was an issue with the storage of a seed, meaning that if the program is invoked in the manner discussed a single result is not duplicated. However, if the bug is only at the level of the seed as both the author of issue 116 and the respondent agree, then this does not matter for the intended usage of the method. It means that you cannot replicate individual instances of the ensemble – but the ensemble average is not affected. It is rather like you are trying to calculate the probability of heads of a weighted coin. You toss the coin 1000 times and calculate the probability of heads as #heads/1000. Now, suppose somehow you can’t read you writing for what you noted for the 672th coin toss. What do you do? You can’t rerun the 672th coin toss. Does this invalidate your estimate of the mean. No, just run it one more time. That is the seed issue.
Now I guess I understand why the author is so upset at this. She is a programmer, but I suspect not a mathematical modeller. Certainly, it seems that she doesn’t ‘get’ the usage of such a model. But what would they fix such a bug in that case. Well why not? I would fix it if it were my code, though I know it doesn’t affect results. For my code, it ‘matters’ to fix such bug as some end users will try to interpret individual runs as though they were meaningful. They like the look of the results or whatever. No matter how many times you tell them that it is one outcome, they want to believe ‘this realization’, so yes, we ensure that individual runs are reproducible. But that is because we sell the code. IF they want to slightly misuse it, then so be it. In the case of this code, it is not misused by it’s operators. They know it is a stochastic algorithm and use it correctly. An individual outcome is not sacrosanct.
Regarding issue 30, this is even less important to the users of this code. So it doesn’t run correctly on a particular supercomputer, but as the authors of the issue said themselves, it runs fine on their laptops. Again, if you are selling the code, perhaps you need to make sure it is machine independent – even on Crays! But the small group of users of this software know full well what systems it can run on and which ones it can’t. Not a issue for the problem in hand.
How about the hotels issue? Well, HotelPlaceType is mentioned 16 times in the method. So the author has decided that it being excluded from one loop is necessarily a problem? Really? Get a grip! The comment most certainty does not lead us to discussion of R0.
However, since we are at R0, then first let’s dispense with the reference she made to the Google Machine Learning paper. That talks about feedback. Now it is not clear how R0 is treated in the code – at least not without a lot more reading. The author of the criticism doesn’t shed any light on it either, just trying to swat it with the Google paper. Well, it is a stronger fly then that! There is a base R0 that is a population parameter. That is the one that is used as input. It is not output from the same code – it is based on population observations (admittedly still a bit ill-defined, although latest work seems to think it is HIGHER then that used by Ferguson, weakening even further the herd immunity strategy (https://wwwnc.cdc.gov/eid/article/26/7/20-0282_article). The point is, R0 has an objective meaning at the outset of the disease prior to any strategy put in place. It may depend locally on population density etc., but it has a global population meaning and the software should account for local variation and update it based on behavioral change.
Suggesting all academic funding for epidemiology be defunded? The words of a typical 2.2 student. I’m not saying that insurance companies couldn’t do a good job, though this kind of problem is not the sort of one that any insurance company actually insures against. Have you tried to make a claim for disruption to your business? The code has no doubt been runs tens of thousands of times by experts. She provides no evidence of any bug of substance, It is fit for purpose for those experts. It is not, and does not have to be some out of the box database or email server that the author has experience of. The problems she works on are extremely important, but mind numbingly routine. Single usage scientific code does not fit her paradigm.
finally, let’s look at a wee bit of empirical evidence. The death rate in New York is rapidly approaching 1500 deaths per million. It is not clear how many people have been exposed to the virus in NY, but probably safe to say not more than 25%. Left to it’s own devices (i.e. with no interventions), it is pretty much certain that this rate would be achieved elsewhere. So, scaling to the population of the US as a whole, this would lead to near enough 500,000 deaths with 25% exposure. As this is a new virus, there is nothing inherent to hold it at 25%, so a factor of 3 times more deaths is possible under worst case scenario. That leads to 1.5 million deaths – so the worst case scenario of Ferguson’s model is not so far fetched after all.
Entropy in a model is not a problem, but you must be able to reproduce an individual case in order to ensure developments are not adversely affecting agreed-upon “good” outcomes. Otherwise you would have to run however many ensembles and take statistics for each incremental change, or else risk adversely affecting your model in unknowable ways. Even then it would not be deterministic, and still liable to mishap.
The entropy must be controlled. This is why pseudo-random number generators are used. If all stochastic decisions are sourced from a unified seed, it is possible to prove robustness of the code whilst achieving the appropriate level of randomness. If they are all individually random, you have no control over your own model. Given there are often significant feedback paths in such models, the end result could end up deviating significantly from its design, quite apart from any expectation of the author (therefore, in an uncontrolled manner).
Uncertainty in a model is derived from running multiple times _with different documented seeds_ and from tweaking the input parameters according to their uncertainty, not simply from running in an uncontrolled manner and having faith the outcome is trustworthy, which is essentially what the Prof seems to be doing.
The questions of how R0 is treated in the code is really the wrong question. The question is how is it treated in the _model_ this codes implements. It seems there is no such well-defined model, he has just hacked on old code as fast as he could to get a paper out.
The hotels issue is meaningful because it appears to be random, and may be a symptom of less obvious but more devastatingly random coding decisions elsewhere in the code. There is enough doubt raised by that one observation to cast reasonable suspicion on the rest of the code. Of course, there may be a good explanation (assumption that hotels are closed?), but this is guesswork.
Whether or not the code has been run by experts is irrelevant since it was supposedly written by one. If he is emblematic of the current state of experts, your objection objects to itself. In any case, if the code is so old and undocumented that even the author cannot understand it, how on Earth are other experts supposed to vouch for its robustness and accuracy? Yeah, there is no way.
Regarding code portability, it absolutely matters that it does not run similarly across different platforms, if that is the case. It proves it is neither robust nor reliable. The core of the code is supposedly 13 years old – or do you suppose the good Prof is using the same machine as in 2007?
I gave a long reply John, but it has gone. Nor sure if the mods are super strict. It’s not the only one that went missing. Maybe it’s just very buggy 🙂
The gist of my answer is that the bug was a regression. The initial code did not appear to have it. The only real reason for the reproduction of individual case is, as you said, to ensure that the code is correct originally, but it can be run safely afterwards without. Indeed, there are other tests which dispense with the need for reputability of individual cases , such as ensuring that the ensemble is ergodic – i.e. not changing over time. In a department such as Imperial, with its strong math focus, I think it is a safe bet that one of the PhD students has tested it – it’s a standard question for stochastic models. Finally, portability is really not an issue, except perhaps as you said for updating the in-house system on which it is run. Two mitigations, the portability issue was to do with the compiler. I’m guessing they have not changed compiler recently. Secondly, PhD students….if you have done a scientific PhD, you know that you will not rely on some 3rd party code without proper testing. The origins of the code are 30 year old – not 13. It is a very safe bet to say that it is as ok as any other out there
“finally, let’s look at a wee bit of empirical evidence. The death rate in New York is rapidly approaching 1500 deaths per million. It is not clear how many people have been exposed to the virus in NY, but probably safe to say not more than 25%. Left to it’s own devices (i.e. with no interventions), it is pretty much certain that this rate would be achieved elsewhere. ”
Nice piece of selection and extrapolation from an outlier there.
Hi Mark, it is not an extrapolation from an outlier, though it is a bit of handwaving. Not an extrapolation from an outlier because it is precisely what you expect to happen with a virus that takes hold. An outlier is something inconsistent with your idea/model. This is perfectly consistent. It also is a very big sample – the biggest one actually. What it is doing is allowing us to see the ‘future’ of other areas if they were to get the same level of infection.
A virus is a virus. It has no politics. It just spreads. Left alone, this would spread and without a good reason to see otherwise then the very large sample that is New York is an indicator of what might happen elsewhere. There may be reasons why it won’t happen but they need to be established. , New York is the elephant in the room.
I have opened a github issue calling for the retraction of studies based on this codebase: https://github.com/mrc-ide/covid-sim/issues/165
You are retired right? Why don’t you write a better simulation and publish it so I can review it. What else do you have to do?
An excellent, professional analysis. Depressing reading to be honest. Time some robust modelling was offered from an open source. Would love to see those results.
The job of an Academic is to think out the concepts and specify the ideas to be used in modelling any topic.
Then it’s the job of professional software engineers to implement those ideas in a reliable and rigourous way.
In the same way, physicists produce concepts that engineers use to create things. Einstein worked on matter-energy duality, but you didn’t have him on the bomb-making team….