The massive collection of data that has resulted from genetic technology is calling into question how much control each of us has over whether we, or others, know about our individual genetic make-up.
The Supreme Court ruled on June 3rd that the government could collect a DNA sample of someone arrested without a warrant. DNA is becoming the new fingerprint for people accused of crimes, although its primary legal purpose may not be simply identification. Other people are giving out their genetic material voluntarily and losing some privacy about its content in order to participate in community-based genetic pools, such as those maintained by for profit DNA identity companies such as 23andMe and Family Tree DNA to help members track down distant relatives and assemble their family ancestral histories.
As it turns out, knowledge about individual genotype may not be restricted to those who take our DNA through legal means or those who willingly give their DNA. New statistical methods are revealing people’s DNA make-up without a single cheek cell.
An example of such an effort is embodied in the company deCODE Genetics, for which hundreds of thousands of people across the world have voluntarily provided DNA samples. The Icelandic company has focused on collecting genotypic and phenotypic information (such as medical records) for as many people as possible in Iceland; it currently handles, stores and analyzes data for almost half of that country’s population. The goal of the company is to link genetic signatures to clinical outcomes; the more data available, the more likely the company can find correlations.
How does it work? The company collects data on DNA (genotype) and on clinical outcomes (phenotype)m which does not require genetic data, and finds correlations. But many of the research volunteers for clinical studies have not provided access to their DNA for that purpose; a priori it seems the research linking genes to clinical outcomes would be limited by the much smaller number of people who shared both. But deCODE has been able to circumvent this limitation by tracking genealogy with the clinical outcome.
Through statistical inference, the company’s researchers estimated the genotype of those who volunteered for the clinical research based on how they were related to people who did volunteer their DNA. All told, some 140,000 Icelanders have allowed deCODE to use their data (genetic or clinical) with hopes of linking genes to diseases and perhaps aiding the development of drug therapies. The result was a slew of publications linking genes to diseases in some of the most prestigious and influential science journals in the world.
But the company then took the statistical inference one step further, imputing and predicting data on all Icelanders, even those who never volunteered clinical or genetic information about themselves. They did this by using their estimates regarding the 140,000 Icelanders in their database, and combining it with genealogical data about the people who had not participated. In other words, if your mother had been in the hospital for a stroke and agreed to participate in a clinical study, while her brother had volunteered his DNA, deCODE would be able to predict your likelihood of a genetic disposition for stroke. Apparently deCODE has enough data to make educated guesses of genetic make-up and risk for diseases for all Icelanders, whether they gave consent or not. In effect, deCODE’s pool of research volunteers has given the company enough information to take a good guess as to the genotype of every Icelander in the world.
This project could be seen as either a creepy invasion of privacy or an innovative application of publicly available data to address major health trends—or both.
For now, Iceland’s Data Protection Authority (DPA) ruled that deCODE needs consent from everyone involved (or their blood relatives) before they can use estimates of non-consenting individual’s genotypes for ongoing research. DeCODE’s plans to estimate the genetic risks of all Icelanders have been suspended, at least temporarily.
Regardless of how the deCode project fairs legally, the implications of its work are profound. Medical records are moving into electronic format, allowing the clinical outcomes of patients to be widely accessible. This information, combined with genealogical information and some volunteer communities providing DNA samples, could open the way for sophisticated statistical software to estimate each person’s DNA make-up and disease risk. The potential benefits for targeting medical research projects and tailoring health responses are incalculable.
What does this have to do with the Supreme Court ruling that DNA can be taken for identification purposes? SCOTUS was concerned with whether collected DNA would be used to identify the responsible party for crimes rather than identification. Some privacy experts are worried that DNA could replace social security numbers in the U.S. But other ways this data could be used include finding genetic correlates of crime and using genealogical studies to estimate—or perhaps incorrectly lay claim to the ability to estimate—the risk of criminal behavior among relatives of criminals.
Our DNA may not be patentable as genes are truly a creation of each individual human, but our genotype won’t remain private for long, whether we offer it up for analysis or not.
Rebecca Goldin is Research Director for the Genetic Literacy Project, Director of Research for the Statistical Assessment Service (STATS) and Professor of Mathematical Sciences at George Mason University. Dr. Goldin was supported in part by National Science Foundation Grant #202726.