Companies and governments across the world are building and deploying a dizzying number of systems and apps to fight COVID-19. Many groups have converged on using Bluetooth-assisted proximity tracking for the purpose of exposure notification. Even so, there are many ways to approach the problem, and dozens of proposals have emerged.
One way to categorize them is based on how much trust each proposal places in a central authority. In more “centralized” models, a single entity—like a health organization, a government, or a company—is given special responsibility for handling and distributing user information. This entity has privileged access to information that regular users and their devices do not. In “decentralized” models, on the other hand, the system doesn’t depend on a central authority with special access. A decentralized app may share data with a server, but that data is made available for everyone to see—not just whoever runs the server.
Both centralized and decentralized models can claim to make a slew of privacy guarantees. But centralized models all rest on a dangerous assumption: that a “trusted” authority will have access to vast amounts of sensitive data and choose not to misuse it. As we’ve seen, time and again, that kind of trust doesn’t often survive a collision with reality. Carefully constructed decentralized models are much less likely to harm civil liberties. This post will go into more detail about the distinctions between these two kinds of proposals, and weigh the benefits and pitfalls of each.
There are many different proximity tracking proposals that can be considered “centralized,” but generally, it means a single “trusted” authority knows things that regular users don’t. Centralized proximity tracking proposals are favored by many governments and public health authorities. A central server usually stores private information on behalf of users, and makes decisions about who may have been exposed to infection. The central server can usually learn which devices have been in contact with the devices of infected people, and may be able to tie those devices to real-world identities.
For example, a European group called PEPP-PT has released a proposal called NTK. In NTK, a central server generates a private key for each device, but keeps the keys to itself. This private key is used to generate a set of ephemeral IDs for each user. Users get their ephemeral IDs from the server, then exchange them with other users. When someone tests positive for COVID-19, they upload the set of ephemeral IDs from other people they’ve been in contact with (plus a good deal of metadata). The authority links those IDs to the private keys of other people in its database, then decides whether to reach out to those users directly. The system is engineered to prevent users from linking ephemeral IDs to particular people, while allowing the central server to do exactly that.
Some proposals, like Inria’s ROBERT, go to a lot of trouble to be pseudonymous—that is, to keep users’ real identities out of the central database. This is laudable, but not sufficient, since pseudonymous IDs can often be tied back to real people with a little bit of effort. Many other centralized proposals, including NTK, don’t bother. Singapore’s TraceTogether and Australia’s COVIDSafe apps even require users to share their phone numbers with the government so that health authorities can call or text them directly. Centralized solutions may collect more than just contact data, too: some proposals have users upload the time and location of their contacts as well.
In a “decentralized” proximity tracking system, the role of a central authority is minimized. Again, there are a lot of different proposals under the “decentralized” umbrella. In general, decentralized models don’t trust any central actor with information that the rest of the world can’t also see. There are still privacy risks in decentralized systems, but in a well-designed proposal, those risks are greatly reduced.
EFF recommends the following characteristics in decentralized proximity tracking efforts:
- The goal should be exposure notification. That is, an automated alert to the user that they may have been infected by proximity to a person with the virus, accompanied by advice to that user about how to obtain health services. The goal should not be automated delivery to the government or anyone else of information about the health or person-to-person contacts of individual people.
- A user’s ephemeral IDs should be generated and stored on their own device. The ephemeral IDs can be shared with devices the user comes into contact with, but nobody should have a database mapping sets of IDs to particular people.
- When a user learns they are infected, as confirmed by a physician or health authority, it should be the user’s absolute prerogative to decide whether or not to provide any information to the system’s shared server.
- When a user reports ill, the system should transmit from the user’s device to the system’s shared server the minimum amount of data necessary for other users to learn their exposure risk. For example, they may share either the set of ephemeral IDs they broadcast, or the set of IDs they came into contact with, but not both.
- No single entity should know the identities of the people who have been potentially exposed by proximity to an infected person. This means that the shared server should not be able to “push” warnings to at-risk users; rather, users’ apps must “pull” data from the central server without revealing their own status, and use it to determine whether to notify their user of risk. For example, in a system where ill users report their own ephemeral IDs to a shared server, other users’ apps should regularly pull from the shared server a complete set of the ephemeral IDs of ill users, and then compare that set to the ephemeral IDs already stored on the app because of proximity to other users.
- Ephemeral IDs should not be linkable to real people or to each other. Anyone who gathers lots of ephemeral IDs should not be able to tell whether they come from the same person.
Decentralized models don’t have to be completely decentralized. For example, public data about which ephemeral IDs correspond to devices that have reported ill may be hosted in a central database, as long as that database is accessible to everyone. No blockchains need to be involved. Furthermore, most models require users to get authorization from a physician or health authority before reporting that they have COVID-19. This kind of “centralization” is necessary to prevent trolls from flooding the system with fake positive reports.
Apple and Google’s exposure notification API is an example of a (mostly) decentralized system. Keys are generated on individual devices, and nearby phones exchange ephemeral IDs. When a user tests positive, they can upload their private keys—now called “diagnosis keys”—to a publicly accessible database. It doesn’t matter if the database is hosted by a health authority or on a peer-to-peer network; as long as everyone can access it, the contact tracing system functions effectively.
What Are the Trade-Offs?
There are benefits and risks associated with both models. However, for the most part, centralized models benefit governments, and the risks fall on users.
Centralized models make more data available to whoever sets themselves up as the controlling authority, and they could potentially use that data for far more than contact tracing. The authority has access to detailed logs of everyone that infected people came into contact with, and it can easily use those logs to construct detailed social graphs that reveal how people interact with one another. This is appealing to some health authorities, who would like to use the data gathered by these tools to do epidemiological research or measure the impact of interventions. But personal data collected for one purpose should not be used for another (no matter how righteous) without the specific consent of the data subjects. Some decentralized proposals, like DP-3T, include ways for users to opt-in to sharing certain kinds of data for epidemiological studies. The data shared in that way can be de-identified and aggregated to minimize risk.
More important, the data collected by proximity tracking apps isn’t just about COVID—it’s really about human interactions. A database that tracks who interacts with whom could be extremely valuable to law enforcement and intelligence agencies. Governments might use it to track who interacts with dissidents, and employers might use it to track who interacts with union organizers. It would also make an attractive target for plain old hackers. And history has shown that, unfortunately, governments don’t tend to be the best stewards of personal data.
Centralization means that the authority can use contact data to reach out to exposed people directly. Proponents argue that notifications from public health authorities will be more effective than exposure notification from apps to users. But that claim is speculative. Indeed, more people may be willing to opt-in to a decentralized proximity tracking system than a centralized one. Moreover, the privacy intrusion of a centralized system is too high.
Even in an ideal, decentralized model, there’s some degree of unavoidable risk of infection unmasking: that when someone reports they are sick, everyone they’ve been in contact with (and anyone with enough Bluetooth beacons) can theoretically learn the fact that they are sick. This is because lists of infected ephemeral IDs are shared publicly. Anyone with a Bluetooth device can record the time and place they saw a particular ephemeral ID, and when that ID is marked as infected, they learn when and where they saw the ID. In some cases this may be enough information to determine who it belonged to.
Some centralized models, like ROBERT, claim to eliminate this risk. In ROBERT’s model, users upload the list of IDs they have encountered to the central authority. If a user has been in contact with an infected person, the authority will tell them, “You have been potentially exposed,” but not when or where. This is similar to the way traditional contact tracing works, where health authorities interview infected people and then reach out directly to those they’ve been in contact with. In truth, ROBERT’s model makes it less convenient to learn who’s infected, but not impossible.
Automatic systems are easy to game. If a bad actor only turns on Bluetooth when they’re near a particular person, they’ll be able to learn whether their target is infected. If they have multiple devices, they can target multiple people. Actors with more technical resources could more effectively exploit the system. It’s impossible to solve the problem of infection unmasking completely—and users need to understand that before they choose to share their status with any proximity app. Meanwhile, it’s easy to avoid the privacy risks involved with granting a central authority privileged access to our data.
EFF remains wary of proximity tracking apps. It is unclear how much they will help; at best, they will supplement tried-and-tested disease-fighting techniques like widespread testing and manual contact tracing. We should not pin our hopes on a techno-solution. And with even the best-designed apps, there is always risk of misuse of personal information about who we’ve been in contact with as we go about our days.
One point is clear: governments and health authorities should not turn to centralized models for automatic exposure notification. Centralized systems are unlikely to be more effective than decentralized alternatives. They will create massive new databases of human behavior that are going to be difficult to secure, and more difficult to destroy once this crisis is over.