Way back in 2011, Netflix released a public advertisement with a contest for its viewers. They ran a game to see which person could come up with the best algorithm to predict user film evaluations, based on every user’s former film ranks.
To support this contest, Netflix released a data set that was anonymized. This contained over 100 million movie ratings from approximately 500,000 subscribers. Within the data set, each entry had a numeric identifier that was distinct to every subscriber. This entry also contained the details of every movie ranked, the date it was rated, and the rating of the subscriber.
In all this, while the contest was underway, two researchers were not particularly interested in predicting film ratings, but they had their eye on data privacy. They began to analyze the data set released by Netflix and revealed that they could re-identify users by connecting the entries to a tiny specimen of readily available data from the IMDb website. As proof, these researchers identified two specific Netflix subscribers and connected the movie review data with available IMDb ratings online. Given the terms of service at IMDb, these researchers only employed a tiny subset of online public reviews. So, based on the dates and the movie review content, these researchers managed to relate two IMDb reviewers who also appeared in the Netflix data set and identified them based on their profiles on IMDb.
Among them, one IMDb reviewer initiated a class action lawsuit against Netflix, whose movie rating history was revealed in the Netflix prize data set. She claimed, that if her personal orientation were to be made known, it would detrimentally impact her ability to continue earning a living and support her family while hindering her family’s ability to live amicable lives. This ineffective anonymity led Netflix to learn the hard way of lawsuits, public relation crises and data exposure. The lawsuit revealed that the concept of an individual’s movie-watching history could expose far more than just the individual’s entertainment choices. For instance, a Netflix member’s movie information could display the subscriber’s personal credentials such as political connections, religion and even sexuality. This information could further expose a subscriber’s struggles with personal issues and substance abuse if any. Under pressure from legal courts and the public at large, Netflix finally settled a lawsuit and stopped the contest.
Companies that store billions of amounts of data regarding millions of consumers can have access to mammoth databases. These databases help facilitate the anonymization and linkage of varied data sets. In fact, data companies that continually linked data sets purchased or collected from various sources offer customized products that can connect online consumers with their off-line activities. These companies could also buy anonymized data sets from multiple places and use automatic tools to link the dots if it fits their business requirements.
And given the massive potential to make profits and gain efficiencies, companies of all kinds, sizes and types are amassing data at a rapid pace. From handwritten notes converted to electronic data and databases and file cabinets emptied into electronic records, organizations are now freed from the physical constraints of print documents. Given the capacity to store more data online and in the cloud, they are now free to analyze the data at hand, while instantly able to retrieve data in milliseconds, unlike in the past.
And as data is becoming more usable, with an instant transfer, the development of structured data formats and digitalization is now making it effortless for people to share and publish data, that is further leading to a broader proliferation and the evolution of data markets.
But how does this translate into data security? In the quest to accumulate more data, what are organizations doing to protect them while such data is in storage or during transit? In the light of data breaches taking place almost every single day, what systematic approaches can companies take to protect documents containing sensitive and confidential information and digital media?
Digital rights management (DRM) is a proven digital security tool that enables creators or publishers of confidential documents, in Word and PDF files, to control what their users or viewers can do with the documents they are entitled to view. By preventing unauthorized redistribution of documents and digital media and restricting the way a user can use the content they have been permitted to view, digital rights management can make it very difficult to alter or steal content, thus providing an efficient approach to the many hit and miss strategies that various DLP tools try to address. Document DRM enables organizations to maintain tight control over their commercially sensitive data while enabling documents to be securely shared with others.