The Facebook Data Scandal: What the real story is and how it affects all of us in research

Why the questions raised from the Facebook data scandal are so important to the research enterprise? Because the likelihood of using sensitive information such as medical data in this way, is probably closer than we think.

The Facebook data scandal has dominated headlines in world press and social media these last few weeks. In summary, a Cambridge University academic gained access to a large data set through the Facebook platform for research purposes several years ago. This data was then shared with a company- Cambridge Analytica without permission of Facebook, who used this data for purposes of political targeting. Facebook reports around 87 million people were affected, with 300,000 of them Australians. Disturbingly, this is not the first time this company has come under scrutiny for questionable involvement in the electoral process.

There are a number of violations in this chain of events, but certainly the catalyst was that a researcher violated Facebook’s data sharing policies by unauthorised sharing of data with a third party. Despite the researchers claiming they had deleted the original data set (this is called a “training” set), it was too late. Computational models were derived using this data, and the legacy of this data lives on through these models that are of serious commercial value. Although it is reported that no names, addresses or other direct identifying information was released, technology has transformed the privacy landscape, and important questions will need to be addressed surrounding data sharing and what the new norm might be for online privacy.

Facebook has a wealth of detailed information about its users, much of which is real-time data, which has tremendous value for research purposes. Even though the current issue centers on behavioral and personality data used in a political setting, there are parallels to broader research, in particular data intensive medical research such as DNA sequencing.

There are themes here that will undeniably have far reaching consequences. In a climate of alternative facts, syndicated information streams and media sensationalism, medical researchers who publish in peer-reviewed journals see themselves as a beacon of reality. A trusted source of truth and inspiration for a better future.

The cost of DNA sequencing (known as “genome sequencing”) has fallen one million fold in the past several years, fueling an explosion in the creation and storage of information about the genetic basis of human disease. However, the interpretation of individual genome sequences for use in improving health requires the widespread ability to compare each individual genome to a compilation of aggregated data, often from a range of sources. With the variety of disease outcomes, varied geography, and low frequencies of many diseases being studied, data from millions of patients will be required. Given this, it is common for data to be stored in repositories where they can be used by other researchers for purposes beyond those of the primary research. Patients agree to this through the informed consent process, where researchers explain to patients the measures they will take to protect their privacy; such as security measures (such as removing direct identifiers and replacing with a code), strict access controls and confidentiality agreements.

This is necessary because sharing even summary-level (aggregate) genomic data carries some degree of privacy risk. Under specialised conditions researchers have shown that subjects can be re-identified by combining coded genomic information with other information types that are publicly available, and that individual subjects sometimes can be distinguished even in summary-level genomic data. As such, by way of its nature your DNA can never really be totally anonymized.

What is an often overlooked point is that once data has been placed into a research database, it is not possible to remove data already in use if patients wish to cease involvement in the study. So in part, you’re in the study forever. The online setting is similar. Once you sign on to an app using your Facebook account, you may delete the app but information about you (and your friends in many cases) already provided will still be kept by the app owners. On a platform like Facebook, this can range from your public profile or more specific information such as pages and groups you follow and like, your friends list and check-ins.

Why the questions raised from the Facebook data scandal are so important to the research enterprise is because the likelihood of using sensitive information such as medical data in this way, is probably is closer than we think. Healthcare systems are severely strained. The population is ageing and growing, and the incidence of common diseases such as cancer is rising. An emergence of novel technologies and tools for improvements in diagnosis, treatment and management of disease makes the health care space a target for established tech giants and startups to mine for investment.

Facebook themselves took the early steps to discuss a research partnership involving sensitive personal information such as illnesses, prescriptions and the frequency they visited the hospital. Although reportedly paused at its early stages, the affiliation was with a major medical center and its governing clinical body, linking a patient’s health record and social media information for the goal of more personalised care.

No doubt Facebook has taken a hit with consumer trust. Although the significance of this scandal is likely not only how consumers respond, but how legislators and regulators respond. Many countries including Australia have announced independent inquiries into data protection and the data sharing practices of social media platforms.

Consumer trust, and the trust of our collaborators in the academic and commercial world, is a deal breaker in medical research. Undoubtedly the researcher gained access to Facebook’s large and protected data set on the basis of his track record and credentials as a researcher in a trusted institution. No single cancer institution in the world has the critical mass to deliver in all cancer areas, and an integrated infrastructure across disciplines is the way forward.

Privacy protection for the future is already proving to be a key headline for Facebook going forward. User permissions, more obvious opt-out and transparency in data sharing practices and policies will no doubt be at the top of this list. And it brings me to an age old point which is more relevant than ever in this climate: all databases are either perfectly useful or perfectly anonymous, but never both.

We cant simultaneously promise technology will revolutionise the world (including health care), without investing in adequate policies and actions to protect the very information that was used to generate it.

And that at the center of this scandal was an academic breach of data sharing practices by a researcher, is the key issue here. That is the real story.

*Feature image photo credit: william-iven for unsplash

Recent APGI Posts

Archived News