Privacy, Access, and Metadata

I had a very strange feeling when reading 'In Defence of Surveillance Capitalism' by Peter Königs. It was systematic, organised, calm, rich with sources, but somehow the argument doesn't add up. Then I realised: this is an academic sealion.

Anyway, I am not going to go through all of it but I was struck by this very hasty argument for an access account of privacy and its application to surveillance capitalism, which is worth a rebuttal:

Macnish (2018) offers a powerful thought experiment in support of the access account. A person forgets her diary in a coffee shop. When she returns to the coffee shop to pick up the diary, she learns that it was in the possession of a stranger. Fortunately, the stranger had the decency not to open the diary. Intuitively, although the diary owner’s ‘data’ were in somebody else’s possession, her privacy seems to have remained intact. A loss of privacy would have occurred only if the stranger had opened and read the diary. This suggests that a person’s privacy is only reduced if actual access to information takes place. If the access account is correct, the assumption that mass surveillance, whether by government agencies or big tech companies, has caused a massive erosion of privacy is false. As Macnish points out, government surveillance involves the collecting of huge amounts data but little actual access. The same is true of ‘surveillance’ by tech companies. It is very unlikely that someone is personally reading your emails or WhatsApp messages, scrutinizing your Google search prompts, or judging your YouTube habits. The collection and processing of these data are performed by computers, not humans. The placement of targeted advertisements, for instance, is entirely automated. It does not involve an employee of an SC company personally sifting through your browser history and deciding which product you might be interested in. Therefore, if access is required for a loss of privacy to occur, ... companies collecting and using massive amounts of user data does not translate to a massive reduction in people’s privacy.

First, we should note how access has become access by a human. Do our intuitions about the simplistic thought-experiment remain the same if the stranger had scanned the diary and kept a copy on his computer, without reading it himself? Secondly, the companies in question do not merely collect and store the data, they use it for their own ends. And those ends are human ends. The computers are instruments serving people.

So let's change the story. The stranger scans the diary without reading it. He then gets his computer to work out the diary-owner's birthday and send her a birthday promotion from his company. He still hasn't read a word of the diary and may be careful not to find out anything about her himself. But she may already find this a bit creepy. But what if he sells the scanned diary to another company, where again no human is going to read it but their computer tries to identify from the diary the birthdays of her friends and sends promotions of suitable gifts. This is then sold to another company which collects diaries and is drawing up maps of friend networks so they can identify who has most influence in our original diarist's friend network and target them with promotions for products they think they will recommend to the rest of the network. And the diaries and network graphs are also bought by a company which buys till receipts, so they can work out which friends go with which other friends to which coffee shops and bars. They use this to create targeted promotion for a new bar opening in the neighbourhood. Another company has bought up bus tickets and is correlating those journeys to diaries to work out who lives where. And so it goes on.

That is the reality of surveillance capitalism. Sure, no human has read the diary, though lots have bought and sold copies of it. And very little of the content of the diary is used by these companies. What they really want and use is the 'metadata' that can be extracted from it. They want to know which bar our diarist went to, when and with whom. They don't need to know what they drank or talked about. Metadata is still private information.

Of course, the content will be used as well, if it is available. And the business model is not limited to selling advertising: all this data has been used to train LLMs as well, which need the content. In fact we could have just tweaked that original thought-experiment by saying the stranger scanned the diary and used it to train an AI he was making which would write diaries.

Hopefully being much more realistic about the data extraction practices of Big Tech shows that the original thought-experiment proved nothing. However, I will grant our sealion that we have not yet shown that these practices amount to a serious loss of privacy. For sure, billions of people seem comfortable with them, or at least with the trade for free digital services (though we should ask how many actually know the extent of the data extraction, processing and reselling). But that may just tell us that they care little about their privacy rather than that it is not a loss of privacy.

My own view is that even if we do accept the access account of privacy, then if someone's private information is used by another person for their own ends, that is a loss of privacy, whether or not the user themselves has cognitive access to the information. Use is a form of access. In the case of digital data, this use rests upon making and distributing multiple copies, and I hope that even the most ardent supporter of an access account of privacy would concede to the control account that making copies of private information has some negative impact on privacy, even if those copies are never read by a human.

Of course, the reality is that under the banner of privacy people care about access in the narrow sense of human cognition, but also about controlling their private information and how it is used. The problem with the data extraction business model is that once the data has been extracted, it is used in a variety of ways which pay no respect to the interests and well-being of the original data source and only serve the ends of the 'owners' of that data. Many of those ends are fairly harmless, irritating at worst, but there is still something ethically dubious about the whole set-up. And thee are some actors in the system whose ends are far from harmless, but that is a different argument.


You'll only receive email when they publish something new.

More from Tom Stoneham
All posts