More on the NCSC defence of the NHSx app

More specifically…

As its own disclaimer points out, regardless of what this paper says, what really matters is what the running code actually does. “Code is truth” as they put it. That is why open sourcing it is so important, so we can all read the code and see what it really does, not what the management thought they’d asked the developers for. They say they are going to open source it in the future. When they do, let’s see how much of it they open source. Because it is a centralised system we need to see not just the code on the phones but all the code on the servers in “the back end”. It will be difficult to know if they have opened up absolutely everything. For example there could be GCHQ code in the backend that they don’t open source and we would never know. So there could be parts of the system that don’t get scrutinised, have security or other problems, and we would never know until it was too late.
The whole scheme is based on self diagnosis. This leads to many problems, as they admit. For example malicious users, (p.5, item number 7). Their mitigation is, unbelievably, that “expert clinicians” will be able to spot malicious events, plus, seeing if any contacts report symptoms within a few days (p11). So you just need a few mates to get together and you could shut down your employer, school, government department etc. And as for getting a “target” to have to self isolate, they acknowledge the problem but “This is future work” p11 para 7. Oh well.
Self diagnosis includes submitting information about symptoms. That is not just personal data, it is a “special category” of data and it gets special treatment under GDPR. Lots of scary legal details about that here. This means that people like me (software developers working in healthcare) go to great lengths to avoid dealing with “special category data”. NHSx will need to be extremely careful not to open themselves up to legal challenges around this.
A design aim is “It should not be possible for the recipient of a notification to determine which of the people they have been in contact with has asserted symptoms”. (p5, point 6). The problem is small data sets. If you spend all day round at your neighbours, and that’s the only person you see for a few days, and then you get a notification that “someone” you were close to has just tested positive, then it’s safe to say your neighbour has just had their medical privacy breached. They recognise this problem: “the low contact number problem” on p10, but the mitigation is: “suppression of the notification can, subject to a policy decision, be done locally in the app, using simple counting rules (subject to a small population around [the user])”. i.e. it won’t notify you that you spent all day with someone who was positive. Isn’t that the whole point of the app? I can see this aspect of the app having all sorts of ramifications and problems. Those policies and “counting” rules have edge cases and they are what eventually lead to headlines.
They use family friendly terms such as users “donating” their data. Sounds like a blood donation right? I have never seen the word donation used like that and it looks like spin to me. The problem is that the user likely doesn’t know what the data is, nor what will be done with it, nor for how long it will be stored, nor who will have access to it, when deciding whether to “donate” or not. To comply with GDPR they need to know all of that, and be able to get a copy of all their data on request from the data controller (who is that in this case?). Perhaps the app will offer up all of that, let’s see.
Interestingly the data includes a “country code” which “allows for multiple countries to interact”. Does that mean England, NI, Scotland and Wales, or other countries? Who else is going to be offered this system? What does “interaction” mean? It gets a mention again on p9: “where multiple countries are collaborating”. Interesting…
The system depends on operators looking at out-bound notifications (p.10):
Notifications are queued for release and some cases will need to be triaged by humans before being released. This triage is for reasons of evolving epidemiological understanding, based on the data, as well as analysis and the need to filter of suspicious cascades.
My first though is that this does not scale. Perhaps this is what the 18,000 contact tracing people are being employed to do? In which case, how do you ensure that 18,000 people never make a mistake?
The data honeypot problem (p12, Reidentification risk) is brushed aside “This is a well understood problem…There is insufficient data here to attract any reidentification risk.” The problem is that insufficient data tends to get supplemented as more features are added, for example because of political demands to know something that is now of interest but wasn’t originally designed for. They admit this: “The risk comes as more data is added to the graph, or commingled with it” and they don’t rule it out, but just say it needs “careful consideration”. You bet it does!

No Result