This summer I changed my LinkedIn password. I changed it on my Gmail and Facebook user accounts, too. (I had incautiously decided that a single password would serve for all my non-payment-linked user accounts – I shan’t be so reckless in future.) The reason I changed the passwords was, of course, a potential security breach that may have resulted in 6.5 million LinkedIn password hashes being posted to a Russian hacking forum.

Such breaches of data security are perhaps neither uncommon nor surprising, given the vast quantity of personal information held in electronic form across the internet and the malicious determination, or just the idle conceit, of those who would gain access to it.

Of course it is not only from security breaches and social networking sites that individual data can be recovered. In 2008 Homer et al. showed how some clever mathematics can isolate data about individuals from the aggregated results of DNA analysis. Although some further information would be required to pick an individual out of a line-up, this development prompted NIH to change their data access policy.

One individual who was picked out, however, was an insouciant sperm donor at a US fertility clinic. The ‘anonymous’ donor was identified and traced by a 15-year-old who had been born through the use of his donation. That resourceful teenager had identified the donor by sending off for a genetic ancestry test, finding a common surname among the men who shared his Y chromosome and using that to narrow the search around information his mother had been given at the time of treatment.

The collection, sharing and interlinking of biological and health-related data is important for the quality of our healthcare and to support biomedical research from which either we, our families, or people in general could benefit. Data collected through initiatives such as UK Biobank or the ESRC’s Life Study offer unprecedented opportunities to identify the causes and determinants and develop better responses to common diseases. The power to process increasingly vast sets of data and the related impetus to realise their value has encouraged two further developments.

The first is the ‘repurposing’ of data, such as the data contained in NHS patient records, now a matter of government policy. This and other similar initiatives designed to support health research in the public and private sectors (such as GSK’s recently announced intention to release detailed data, including anonymised patient-level data, from its clinical trials), are subject to careful systems design and levels of control over the context into which the data are released.

The second development is the ultimate extension of the first, namely open access to data in scientific research, including, in cases such as the Personal Genome Project, and the EBI’s 1000 Genomes project, to unique individual data. Rather than controlling access to data, data are published without further control over the context in which they might be used.

Collections of personal data have been guided by well-established data protection principles, including lawful processing and purpose limitation – that the data are used only for the purpose for which they are collected (be that purpose ever so broad). Going beyond this conventionally requires further consent or that the data are anonymised.

Both of these mechanisms – consent and anoymisation – have come under significant pressure from information processing technologies and the kinds of research that these technologies enable. From some perspectives they can appear outmoded and, indeed, counterproductive.

There are significant ethical questions to get at here. The shibboleths of consent and anonymisation often obscure deeper interrogation of the real harms that they allegedly protect against. Confidentiality is not an end in itself; losing data is not a harm in itself: there are underlying values at stake and their meaning needs to be examined and reinterpreted in the light of historical developments. The common basis on which we debate, argue and resolve questions about how we share and link data, populated by a jumble of claims about individual rights, utility, public interest, humanistic values, harms and benefits needs some diligent ground-clearing.

Last week we published a call for expressions of interest, announcing a new work theme for the Council on Genomics, Health Records, Database Linkage and Privacy. Over the next few weeks we intend to assemble a Working Party to be chaired by Professor Martin Richards. There is, admittedly, no shortage of policy, strategic and regulatory advice. The Wellcome Trust, CRUK, MRC and ESRC) have formed a new, high powered Expert Advisory Group on Data Access; the Caldicott2 review is considering issues of NHS patient information governance in the new context; the ESRC’s Administrative Data Task Force is looking at enabling the wider use of administrative data; GSK will form their own independent panel of experts to review requests for their data on scientific merit. So things are moving forward in this area.

The Working Party will need to swim with others in these currents. However, we will also take time to gather evidence and deliberate carefully. We will dig deeper into the ethical background and develop an understanding of what values inform the sharing and linking of health and biological data, especially including genomic data. And we will try to reach practical conclusions to support policy development and research governance. This work will begin in 2013.

Comments (1)

  • ‘Big data’ raises big questions for biomedicine |   

    […] and use data, put pressure on the measures that have evolved to govern it. As I suggested in an earlier blog post, the conventional approach of “consent or anonymise” appears fragile when statistical methods […]

Join the conversation