The $1000 human genome: A reality check

| January 21, 2014 |
Image via Guardian. Image via Guardian.

“The $1,000 human genome is here. For real this time.” So said Ashlee Vance at BloombergBusinessweek on January 14.

A day later, in a new piece, Vance was more forthright about the fact that the $1000 genome is a commercial claim: “The $1,000 human genome sequence is here, according to Illumina Inc. (ILMN)”

ILMN. Ah, yes. ILMN is NASDAQ’s label for the stock of Illumina, the leading maker of DNA sequencers. ILMN rose more than 10 percent on Friday, its fourth record day in a row.

“The $1,000 Genome Arrives — For Real, This Time” was also the headline on Matthew Herper’s January 14 piece at Forbes. But Herper was more anchored in the real world than Vance was on the same day. She did not examine the claim of a genome for $1000; he did. He also pointed out that Illumina’s amazing new sequencer was actually a gaggle of 10 sequencers, hence its name, the HiSeq X Ten. Each of the 10 is priced at a million dollars, and they are sold only as a group, so the total on the invoice is $10 million. Three customers have signed up for March delivery: Korean sequencing company Macrogen, Harvard-MIT Broad Institute in Cambridge, and the Garvan Institute of Medical Research in Australia. Illumina is hoping to sell five this year.

What goes into the $1000 genome?

And here’s how you get to the $1000 genome from $10 million worth of equipment. Illumina claims that the HiSeq X Ten can produce 18,000 human genomes in a year, each one sequenced 30 times for accuracy. The company says that volume will result in these per-genome costs: $797 for reagents to run the machines, machine depreciation $137, employee costs for preparing samples and running the machine up to $65.

Let’s assume those figures are accurate. Mick Watson, who does computational genomics at the Roslin Institute in Edinburgh and blogs at Opiniomics, has run an independent analysis and writes, “So actually, I think they might be right in claiming the $1000 genome – if you do 18,000 human genomes per year for four years on each X Ten system.  That’s a lot of human genomes though . . .”

So the $1000 cost depends on high volume. And it does not include overhead (for example, electricity to run the HiSeqX Ten ). Nor whatever markup the genetic testing company decides to charge.  Nor costs of analyzing the completed genome, which is just as essential as the genome itself. Not to mention the computing costs. One consultant has called them “horrifying” and we’ll get to that in a moment.

In short, the HighSeqX Ten may be a grand technical achievement.  But not one that makes it possible for you to walk into a lab and get your genome sequenced for $1000.

Why do we need the $1000 genome?

What’s so important about the $1000 genome? At Nature, Erika Check writes that we need it because true personalized medicine won’t happen until there are large numbers of human genomes for comparison, maybe millions of them. About that she’s certainly right; most diseases are shaping up to be the product of many genes, each contributing a small amount to disease development.  (For the moment we’ll ignore the also-essential contributions of infectious organisms, toxic substances, epigenetics, etc.)  The massive amount of sequencing required to figure out just the genetic complexities isn’t going to happen until genomes are reliably cheap to produce.

But $1000 as a goal for a human genome certainly happened well before there was much talk about personalized medicine. In fact, the $1000 genome became an official United States government project before 2005, when I first blogged about it at The Scientist. I suspect the choice of $1000 was somewhat arbitrary, a nice round affordable number that contrasted mightily with the $3 billion the first human genome cost in 2001. (Yes, that’s a b.)  If you want to get down into the weeds, the National Human Genome Research Institute at the National Institutes of Health has a nice primer on DNA sequencing costs and methods.

The government quest, by the way, is still up and running. The project is known as Revolutionary Genome Sequencing Technologies – The $1000 Genome, and will cost $4 million this year alone in the search for a true (or perhaps I should say truer) $1000 genome.  The latest National Institutes of Health request for applications closed last fall, calling for projects “to develop novel technologies that will enable extremely low-cost, high quality DNA sequencing.  The goal of this initiative is to reduce the cost of sequencing a mammalian-sized genome to approximately $1000.”  The call urges applicants to explore methods other than those currently in use, especially “high-risk/high-payoff applications.”

I suppose the NIH fantasy is a desktop machine suitable for a doc’s office. Or, who knows, maybe a kitchen table. Which is a very long way from the HiSeqX Ten.

Big human genome projects

It didn’t take the HiSeqX Ten to launch ambitious human genome sequencing projects.  The U.K. plans to sequence 100,000 National Health Service patients by 2017. And just before Illumina’s sequencer gang was announced last week, the biotech company Regeneron Pharmaceuticals said it was collaborating with Pennsylvania’s  Geisinger Health System to sequence the genomes of 100,000 Geisinger’s patients. Leslie G. Biesecker, chief of the genetic disease research branch at the National Human Genome Research Institute, told the New York Times that it was by far the largest US clinical sequencing project ever.

Regeneron plans to keep costs under $1000 by sequencing only the exome, the 1-2 percent of human DNA containing protein-coding genes. By contrast, the Illumina machines will sequence the other 98 percent as well, yielding complete human genomes. The Times says sequencing 100,000 exomes will probably cost about $100 million over five years. The UK project is estimated to cost  £100 million and apparently is also planning whole-genome sequencing.

Also note that the HiSeqX Ten will do complete human genomes only. No doggy DNA, no extinct animals, no food crops or pests, no microbiomes. Not even human microbiomes, despite their increasingly obvious central role in human health.

Storage and other post-sequence roadblocks

For another reality check, take a look at the blog of Glenn K. Lockwood, who consults on data-intensive computing. He calls Illumina’s announcement “horrifying because the end of that $1000 sequencing process is only the very beginning of the computational demands associated with genome sequencing.”

What do you do with 18,000 complete human genomes a year, which works out to 340 genomes per week? The weekly haul alone amounts to 30-50 terabytes of data, he says. It’s hard to even estimate these storage costs because they will depend on policy decisions (How long do we keep data?  What data do we keep?) But Lockwood takes a crack at it, coming up with $60,000 to store just four weeks’ worth of HiSeqX 10 output. And, Lockwood says, storage is not the only issue. To pick just one example from several, “an additional $8,800 to $21,000 per week to transform the raw output into aligned sequences.”

Lockwood’s bottom line is sobering.  He fears that we might get a different kind of new genomic era from the one we were expecting: A time in which the rate of genomic discovery is driven not by fast, high-volume, less expensive sequencing but instead by how much computing capacity can be summoned to manage the results.

Tabitha M. Powledge is a long-time science journalist and a contributing columnist for the Genetic Literacy Project. She writes On Science Blogs  for the PLOS Blogs Network. New posts on Fridays.

  • Ruslan Dorfman

    Great summary! seems that sequencing platforms develop according to Christiansen’s model in “Innovator’s Dilemma”

    • Tabitha M. Powledge

      Many thanks; it’s a fascinating subject. The rest of us may be behind the curve, but the data people have been wringing their hands about the flood of genomics data for some time. The Epigenomics Mapping Consortium has complained, “The sheer volume and complexity of consortium-generated data has pushed the limits of existing analytical and visualization tools.” And that was in 2010. See my feature “Behavioral Epigenetics: How Nurture Shapes Nature” here: http://bit.ly/1aHRz2C

  • Amy Williams

    This is a great article. One math check though, if the sequencers generate 30-50 terabytes per week, the monthly rate is only about 200 terabytes with a yearly rate of 2.5 petabytes. Perhaps something is missing in the calculation?

    • Tabitha M. Powledge

      You’re quite right. Google tells me that a petabyte is 1024 terabytes. I simply picked up what Lockwood said without doing the math myself, for shame. I have written to him asking for a clarification. Thanks for catching my flub of basic arithmetic–%^(

      • Tabitha M. Powledge

        Lockwood has quickly replied to my query–and corrected his arithmetic at his post. He writes that Amy is absolutely right. Here’s his email reply, which I’m quoting with his permission: “The 1 petabyte per month came from me mistakenly using the 250 terabytes of transient storage that was required to perform the initial read mapping of ~400 human genomes (see my figure with the green blocks). This 250 TB of “scratch” storage could be re-used with every week’s new batch of data though, so it isn’t additive. The cold storage requirements, at bare minimum, are indeed ~200 TB/month to store the data straight out of the sequencers. It becomes 400 TB/month if you want to store the raw output and the aligned genomes. Both of these are a bit smaller than the 1 PB/month I stated though, so I will update my post immediately. Sorry about that!”

  • Iain Bancarz

    Good article! A few points of clarification:

    1) In principle, you only need to sequence a patient’s DNA once in his or her lifetime. (Leaving aside possible improvements in sequencing accuracy, and cancer genomes which are a whole different story.) So human genome sequencing doesn’t necessarily need to be in a doctor’s office.

    2) That said, Illumina and others are aggressively developing smaller, cheaper desktop machines — not yet for the doctor’s office but certainly within reach of an average hospital. Illumina announced the NextSeq at the same time as the HiSeqX; it got a lot less attention but is very much aimed at this market.

    3) Microbial genomes are a lot smaller than human — roughly 0.1% the size, in the case of E coli. So the massive throughput of a HiSeqX is less important for the microbiome.

    4) Lockwood is correct to say the computational demands are formidable. But long term storage is very much easier if we dispose of (or at least radically compress) the raw data immediately after the genome has been assembled. A finished genome is not that big; the real killer is attempting to retain reads which cover the genome 30 times over and include high-resolution quality scores.