Last week, a Dutch artist decided to celebrate Eric Arthur Blair's birthday by placing party hats on all the CCTV cameras on the street. Wikipedia describes 1984 as a dystopian novel; yet the constant surveillance described as a work of fiction in that novel now is a reality. If anything, we've become so accustomed to cameras keeping an eye on every move; we've learnt to ignore them.
Cameras decorated with party hats should definitely remind us of the surveillance states we live in. With this, I don't mean that the surveillance is necessarily malign; it can be benign, almost innocuous, that we submit to such surveillance willingly, and appreciate the results that such surveillance presents.
In fact, everything that we do is tracked and studied. Go to a grocery store and buy stuff using a credit card, and the store tracks exactly what you buy, and knows with great certainty what you'll buy next. Search for something on the net, and Google will hound you with ads for similar stuff. In fact, GMail also does a terrific job of bombarding you with ads all over the net. Don't believe me? Send an email to a friend with some travel plans. This was the subject of a hilarious office prank (unverified) where the entire office colluded to send a worker emails with "SPOON SPOON SPOON" written in white text at the end of every message because the worker complained about his spoons going missing. Soon, this poor guy had ads for spoons following him around on the web.
Closely associated with this lack of privacy is a really hot topic for research. It's called "Big Data". As the name suggests, it's a collection of data so large that it's impossible to manage using conventional database management tools or the like. And that's true because big data is obtained by direct dumps of all the activities of all the people in the world.
Big data is useful; as it promises to build a better world. For example, if we have access to the travel habits of every person, we can predict traffic patterns, and actively modify road conditions (synchronised signals, different speed limits) to increase the efficiency of the road network. Los Angeles reports a 5 mph increase in average road speed since it synchronised its traffic signals, and that's without knowing all the information that Big Data provides.
Google Now is an example of a service that uses big data. It's possibly the most intrusive privacy violation that exists today, but it gives the best results. For example, Google Now uses sensors on the phone to detect whether I'm walking, biking, or riding in a car or a bus, then tailors information that I may need to know, for example, giving me information about travel times to work, and from work, gives me movie recommendations on weekends, for movies that I like (generally the sci-fi type); and at the end of every month, tells me how much I've walked in the month, and compares with the last month. Useful information, but risky.
So, what are my objections to companies collecting so much data about us? Well, the first objection is that we have different goals from the companies that offer us service. A company wants to increase its profits, we want to minimise our spending. Now, let's say, for a moment, that Google had a store, where they sold stuff. If they noticed, for example, that I've been researching noise cancelling headphones for a while (which I am, at the moment, I just don't think that I can afford to spend $400 as a grad student; that's almost a week's income), Google will and does push ads for noise cancelling headphones at the moment; but if it had a store, it could potentially offer me a higher price, and I might just buy the headphones at a higher price, impulsively. Target and other retail chains do the same thing. Target found out that a teenage girl was pregnant before her father did, and this was from her purchase patterns. If you read the linked article, you'll read how Target sends coupons for stuff that you may need, and that would generally lead to impulse purchases for stuff that you don't actually need. In the entire algorithm, Target does not know who you actually are; all it does is uniquely identify a person with a number related to a credit card.
I've studied machine learning, and I'm fascinated by the opportunities that "Big Data" presents. Computers are now capable of assessing a lot of information, and they can observe patterns where we humans fail; they can group and cluster information and make inferences faster than we possibly can. Computers can also find patterns that make no apparent sense to us humans, or patterns that we could not possibly think existed. In the previous example, Target sent coupons for maternity stuff based on something as innocuous as vitamin purchases (and other stuff of course, vitamin purchases would be just a part of the pattern).
However, I'm also wary of the shortcomings of the "Big Data" model. Computers can certainly identify patterns, but they may be grossly wrong, or they may have insufficient information. For example, two people may share the same credit card, and then the computer can make a grossly incorrect prediction. In most cases, such errors would be harmless; for example, if I was to receive coupons for steak, I'd just throw them in the trash.
There are some dangerous mistakes from the "Big Data" model, however. One such danger occurs when the government is involved. The government is a powerful entity, which is an ass, and which has the power to really screw people over; multiple times.
It's impossible to talk about this without referring to current events. Both PRISM, and the Verizon call meta-data. President Obama mentioned that the data collection is harmless, because no information was collected about the content of the call, only the meta-data; and that, he claimed, was not invasive, or a violation of privacy.
I really don't expect politicians to understand technology, just as I'm flustered when I'm given a legal document to sign. However, I'll try my best to explain the shortcomings of a model which collects and analyses call meta-data with a simple example; given in the context of Indian law, but which has parallels in any legal system.
Betting is illegal in India. A common form of betting is around cricket matches; and this is a big and illegal business in India. Let's say that the government wants to crack down on bookies. So, I propose the following system that should work given that the government has access to call meta-data for an extended period of time.
We expect that a bookie will receive more calls when a cricket match is on, or around the time a cricket match is scheduled. A bookie will make a lot of calls after a cricket match, possibly for collection of dues. However, I don't need to predict this behaviour. I'll simply ask a computer to analyse the call patterns of known bookies, and observe variations around the time cricket matches were played. Further, since most people will have the same bookie over a series of matches, the computer can analyse these patterns as well. Using this information, a computer can print out a list of people it suspects (with a high degree of confidence) of being bookies. Law enforcement will crack down on bookies, get warrants, search their premises, confiscate their property: the whole deal. And we've just saved the country from illegal gambling.
However, computers make mistakes. More often than not, these mistakes arise from incorrect assumptions about the data being supplied. For example, the reasoning in the previous paragraph is flawed, because there is another class of people who will have similar call history patterns: cable service providers. In a country where people will always watch cricket matches, and with poor cable service; a lot of people will call up cable service providers to complain about poor quality of service, or broken cables around cricket matches. The system will wrongly flag cable service providers as bookies, and until law enforcement finds out that they're wrong, the cable service providers will go through hell multiple times over (trust me, you don't want to go to jail, and in India, you don't want to ever run into the police as a potential suspect).
Law enforcement officers want to detect and stop criminal behaviour. More often than not, officers suffer from confirmation bias, which means that they will look at facts that confirm their hypotheses. If a computer system says, with high confidence, that someone is engaged in illegal activities, they will look for evidence of illegal activities, which can result in intrusive and often illegal searches. The old maxim that you should not be afraid if you've got nothing to hide doesn't really hold. Everyone has something they don't want to be public, and this need not be a secret. Everybody closes the stall door in a toilet.
Of course, many of these suggestions may be impossible or impractical in today's world, without sacrificing a lot of convenience. For example, while I recommend cash transactions, I almost exclusively use cards, because they're so much more convenient. This blog uses Google Analytics to track user behaviour, so that I can write articles that are favoured by more readers, though honestly, I could not care less about what you read: this blog is mainly for my own satisfaction, and as a way for me to collect and archive my views on a variety of topics.
Just be wary, and actively make your opinion known, so that we can have better services with better privacy in the future. It's possible, and we need to let companies and the government know that we value our privacy.
As an aside, Google always hinted at its involvement with PRISM. Chrome's incognito mode always warned us about "surveillance by secret agents". :)