Scientists savage each other over ‘Junk DNA’ study while journalists mis-report the science

| February 25, 2013 |
ENCODE

Is most of our genetic material akin to evolutionary detritus that has accumulated over the course of millions of years during which single-celled organisms evolved into modern humans, as most geneticists have long believed? Or is it something a great deal more functional, as a recently-fashionable theory hyped by the media has it?

To quote from one prominent scientist watching this fast-developing Battle Royale play out: The critics of the new theory “haven’t just poked a hole in the balloon, they’ve set it on fire (the humanity!), pissed on the ashes, and dumped them in a cesspit.”

Rough stuff. Over the top. And it’s just one of numerous such over the top statements. Here’s the skinny.

Last fall, scientists with the international ENCODE (the Encyclopedia Of DNA Elements) consortium, launched by the National Genome Research Institute in 2003, announced what the authors said was a breakthrough in identifying all the functional elements in the human genome sequence. Published across 30 papers in Nature, the consortium claimed that long stretches of DNA, previously dismissed as “junk”, are in fact crucial to the way our genome works.

“In 2000, we published the draft human genome and, in 2003, we published the finished human genome and we always knew that was going to be a starting point,” said Dr. Ewan Birney, of the European Bioinformatics Institute near Cambridge, one of the project’s principal investigators. Birney in fact reflected mainstream views when he commented, “We always knew that protein-coding genes were not the whole story.” It’s what was said about the “whole story” and how the media botched the coverage that morphed into “the story.”

The consortium claimed to have identified more than 10,000 new “genes” that code for components that control how the more familiar protein-coding genes work. Up to 18% of our DNA sequence is involved in regulating the less than 2% of the DNA that codes for proteins, they asserted. Encode scientists said about 80% of the DNA sequence can be assigned some sort of biochemical function.

The Encore analysis was actually tedious reading designed for the genetic über-insider. After all, it was a sprawling project designed to deliver a reference manual for the genome. Imagine their surprise when the scientists, reveling in the attention, morphed into overnight global rock stars,.

Nature played up the story big time, with a number of firsts: cross-publication topic threads, a dedicated iPad/eBook App and web site and a virtual machine. Journalists, by and large, were rapturous, devoting pages of articles and elaborate mockups and online tutorials to explain this apparent breakthrough. Discover announced the “death of junk DNA”. “Far From ‘Junk’,” headlined Gina Kolata’s article in The New York Times, with nary a hint of challenge to the paradigm-shifting conclusions, particularly the 80% claim. Robin McKie, The Guardian (UK)’s top flight science editor, as recently as this past weekend, characterized last September’s announcement as the scientific surprise of 2012. Examples of gushing, uncritical coverage abounded.

On second thought

Actually, the broad strokes of what the consortium found had been known for years. Evolution is unforgiving. If the roulette wheel of genetics lands on our number, and we get a beneficial mutation, our descendants are likely to thrive and reproduce. Future generations unlucky enough to inherit a harmful mutation are history. But where amongst our 20,000 or so genes can we find the DNA materials that matters—the proteins that code for our physical and behavioral characteristics?

junk DNA

Until the past decade or so, it had been accepted wisdom in the genetics community that only the tiniest percentage of the human genome contains the instructions that determine how we look, feel and act—whether we (or our ancestral population group) are more likely to be grumpy or gregarious, impetuous or cautious, generous or a Grinch, a speedster or a marathoner, slow-witted or a math ace. Most mammalian DNA—more than 98 percent of it—was considered accumulated evolutionary “junk.”

Scientists long likened this genetic material, which they sometimes called the “dark matter” of the human genome, to a discarded heap of outdated books with the relevant wisdom incorporated in newer, revised volumes squeezed into the most usable <2 percent. This vast majority of genetic material known as functional noncoding DNA, mostly embedded within and around the genes, was thought to play an important but murky role in regulating how the coding genes go about their business.

That Encode announcement last September—contested by many scientists but embraced mostly uncritically by science journalists—challenged the established view. The focus of most researchers had largely been on looking for glitches within genes themselves. The Encode research suggested we should look elsewhere in our DNA sequence—to the genetic junkyard. It was said to usher in a new chapter in our understanding of how genes operate.

This new perspective, they said, would open new leads for scientists looking for treatments for conditions such as heart disease, atherosclerosis, type 2 diabetes, psoriasis diabetes and Crohn’s disease that have their roots partly in glitches in the DNA.

The sh— hits the fan

Most scientists had expected that the Encode researchers would uncover some new functions for non-coding DNA, but the 80% figure was way out of proportion to what everyone had expected. The problem was that they used a very low bar for “function”. In a rebuke startling to credulous journalists but not to the genetics community, a caustic and often sarcastic critique, astonishingly titled, “On the immortality of television sets: “function” in the human genome according to the evolution-free gospel of ENCODE”, and just published in the journal Genome Biology and Evolution takes down the Encode scientists in language usually encountered only in late night pub brawls.

“Everything that Encode claims is wrong,” charged the lead author of the paper, Professor Dan Graur, of the University of Houston. Graur and his co-authors, who among the leading geneticists in the world today, claim the Encore group made a Genetics 101 mistake: confusing biological activity with functional importance in the cell. “They completely exaggerated the amount of human DNA that has a role to play inside our cells. Most of the human genome is devoid of function and these people are wrong to say otherwise.”

“This is not the work of scientists,” he added in unchacteristically harsh language for an academic journal. “This is the work of a group of badly trained technicians.”

Dr Ewan BirneyBirney immediately fired back. “The nature of the attacks against us is quite unfair and uncalled-for,” he said. “Our work has very important implications for understanding disease susceptibility.” The Encode project involved 442 researchers, based at 32 institutes around the world, and required 300 years of computer time and five years in the lab to get their results.

For perspective, let’s be clear that even Birney himself was chagrined by the media coverage in the days after the release of the Encode report last September. As he pointed out at the time on his blog Genome Informatician, what should have gotten the most attention—the publication of years of accumulated raw data for general use by the scientific community—was lost in all the hullabaloo of the hyped conclusion, which was never what the study was supposed to be about.

“The overall importance of consortia science can not be assessed until years after the data are assembled,” Birney wrote in his Nature article last September. “But reference data sets are repeatedly used by numerous scientists worldwide, often long after the consortium disbands.” As a result, the data production scientists, the real heavy lifters, got short shrift when credit was being distributed.

Birney also identified the elephant in the room: the wording of the claim that the genome is 80% functional. He wrote it was a real mistake, although he helped craft the original news release. Yes, there might be biological activity, but “functional”? Much of it was almost certainly just “biological noise,” he said—although this candor did not make its way into the news release that spurred thousands of overheated stories.

As P.Z. Meyers, a respected biologist at the University of Minnesota-Morris, noted over the weekend on his popular Pharyngula blog, the Encode research consortium that claimed to have identified function in 80% of the genome, actually discovered that a formula of 80% hype gets you the attention of the world press—a point he pointedly made in his analysis last fall. The “Encode delusion,” he called it. Within days of the Encode announcement, a US Circuit Code heard arguments challenging California’s warrantless DNA collection program based on the claim that most of our DNA is functional.

Meyers, like most geneticists—almost all of who were ignored by the major media—was grateful for the raw data but underwhelmed by the overarching conclusion. It’s “patently ridiculous,” he wrote. “That isn’t function,” he said of what was identified. “That isn’t even close. And it’s a million light years away from ‘a critical role in controlling how our cells, tissue and organs behave.’ All that says is that any one bit of DNA is going to have something bound to it at some point in some cell in the human body, or may even be transcribed. This isn’t just a loose and liberal definition of ‘function’, it’s an utterly useless one.”

What’s the real story? In a desire to create a neat narrative, the Encode team appeared to have bewitched themselves. Our DNA is extremely complex—like “opening a wiring closet and seeing a hairball of wires,” said Mark Gerstein, an Encode researcher from Yale University, last fall. “We tried to unravel this hairball and make it interpretable.” In their understandable zeal to make things comprehensible, many key scientists in the project and most journalists stumbled badly.

This saga has yet to fully play out.

More recriminations and nastiness undoubtedly lie ahead. The brouhaha has already set off debates over professionalism within the science community. It’s also emblematic of the deep rifts in the research community, especially amongst genomics researchers. But the biggest disappointment so far is the lack of self reflection by many science journalists who by and large let their critical instincts lapse, exchanging the grayer and perhaps duller reality for a sensationalistic headline. Will they own up?

Jon Entine, executive director of the Genetic Literacy Project, is a senior fellow at the Center for Health & Risk Communication and STATS (Statistical Assessment Service) at George Mason University.

  • THEMAYAN

    Its interesting that the article above provided no actual quotes from Birney, or at least did not use quotation marks as quotation marks were only used on the words “functional” “biological noise,” I find this interesting because in the official ENCODE video Ewan Birney says with his own lips when speaking of this 80%, (and If I may also be allowed to paraphrase)…..We may not understand all of what it is doing, but it is probably doing something important…

    As for Dan Graur paper. Just reading the abstract alone sounded more like a hit piece than professional scientific journalism. The mean spirited tone reeked of anger and bias.

    As I read further, I was surprised to find the authors paraphrasing Frank Zappa. Don’t get me wrong, I loved Zappa, but I think even he would have said that it would be very silly to use any of his utterances in a science journal, and especially one which seems to be more personalized than unbiased.

    “Data is not information, information is not knowledge, knowledge is not wisdom, wisdom is not truth,” —Robert Royar (1994) paraphrasing Frank Zappa’s (1979) anadiplosis

    I also found it interesting that they quoted T. R. Gregory who is critical of ENCODE, but for completely different reasons. According to Gregory, we supposedly knew about function decades ago and that this should be so no big surprise. Of course as I had to remind him that maybe one of the problems laid in the fact that many scientist ignored this data (as they should have just stuck to science and not get involved in the culture war) as it is well document that many instead, held this useless junked DNA paradigm as a poster child for bad design with all this supposed empirical evidence to back it up. Like many others, Gregory is of the sort that follows the logic, that if the data is incongruent to the theory, then the data must be wrong as he speaks of his “onion test” concerning C Value paradox below.

    “The onion test is a simple reality check for anyone who thinks they can assign a function to every nucleotide in the human genome. Whatever your proposed functions are, ask yourself this question: Why does an onion need a genome that is about five times larger than ours?” —T. Ryan Gregory”

    They also quoted PZ Myers who tells his student that those who search for function in ncDNA are only interested in job security, i.e. lets just give up research because after all, we already know its junk. This mindset is a science stopper.
    .

    Dan Graur
    “playing fast and loose with the term “function,” by divorcing genomic analysis from its evolutionary context and ignoring a century of population genetics theory”….

    Dan maybe its time to update these 80 year old constructs. As this paper below which is one of many indicates……

    The new biology: beyond the Modern Synthesis Michael R Rose1* and Todd H Oakley2 . The last third of the 20th Century featured an accumulation of research findings that severely challenged the assumptions of the “Modern Synthesis” which provided the foundations for most biological research during that century. The foundations of that “Modernist” biology had thus largely crumbled by the start of the 21st Century. This in turn raises the question of foundations for biology in the 21st Century. .

    .

    .Dan Graur
    “There are two almost identical sequences in the genome. The first, TATAAA, has been maintained by natural selection to bind a transcription factor, hence, its selected effect function is to bind this transcription factor. A second sequence has arisen by mutation and, purely by chance, it resembles the first sequence; therefore, it also binds the transcription factor. However, transcription factor binding to the second sequence does not result in transcription, i.e., it has no adaptive or maladaptive consequence. Thus, the second sequence has no selected effect function, but its causal role function is to bind a transcription factor”

    Here is what ENCODE’s lead analysis coordinator E. Birney says about this….

    “Rather than being inert, the portions of DNA that do not code for genes contain about 4 million so-called gene switches, transcription factors that control when our genes turn on and off and how much protein they make, not only affecting all the cells and organs in our body, but doing so at different points in our lifetime. Somewhere amidst that 80% of DNA, for example, lie the instructions that coax an uncommitted cell in a growing embryo to form a brain neuron, or direct a cell in the pancreas to churn out insulin after a meal, or guide a skin cell to bud off and replace a predecessor that has sloughed off”

    .

    Dan Graur
    “The human genome is rife with dead copies of protein-coding and RNA-specifying genes that have been rendered inactive by mutation. These elements are called pseudogenes (Karro et al. 2007). Pseudogenes come in many flavors (e.g., processed, duplicated, unitary) and, by definition, they are nonfunctional”

    Not according to paper below…..

    PSEUDOGENES: Are They “Junk” or Functional DNA? Annual Review of Genetics

    Vol. 37: 123-151 (Volume publication date December 2003)
    First published online as a Review in Advance on June 25, 2003
    DOI: 10.1146/annurev.genet.37.040103.103949″

    Pseudogenes have been defined as nonfunctional sequences of genomic DNA originally derived from functional genes. It is therefore assumed that all pseudogene mutations are selectively neutral and have equal probability to become fixed in the population. Rather, pseudogenes that have been suitably investigated often exhibit functional roles, such as gene expression, gene regulation, generation of genetic (antibody, antigenic, and other) diversity. Pseudogenes are involved in gene conversion or recombination with functional genes. Pseudogenes exhibit evolutionary conservation of gene sequence, reduced nucleotide variability, excess synonymous over nonsynonymous nucleotide polymorphism, and other features that are expected in genes or DNA sequences that have functional roles”…..

    It seems the biggest criticism in this paper is in how the the word function is used, as its definition of function is used broadly, but it also seems kind of silly to not expect such a broad definition when the findings themselves are so broad. And again just because the findings seem incongruent to how we view selection based on the modern synthesis (and or what Stewart Newman refers to as these old entrenched dogmas) it does not mean the theory should trump scientific revelation & the discovery of new and empirical data. Maybe it’s the theory that needs changing. One very well known scientist once told me. Scientist don’t change their minds, they just die.