Meeting the challenge

Why genetic variation is important

Nearly a decade after the completion of the Human Genome Project in 2000, several lessons have become clear. First, the complexity of human genetic variation has turned out to be far greater than initially imagined. Forms of variation include single base changes as well as copy number variants, structural variants, insertions, deletions, inversions, and others. In addition, assigning specific clinical utility to these types of variations has also been more difficult than originally anticipated. This challenge results from reasons of biology, but also reasons of statistics, logistics and economics. Only by designing a route around these difficulties can we realize the benefits promised by the genetics revolution.

The search for clinical relevance

Regardless of the underlying structural complexity of the genome, the likely clinical relevance of any given variation can probably best be understood as a function of the interaction of three considerations:

  1. The prevalence of the variation in the population
  2. The specific effect(s) of the variation
  3. The clinical decisions these effects might influence

Prevalence of the variation: tagging SNPs

In Perlegen’s experience, population prevalence is a key starting point, specifically with regard to what are termed “common” variants. Such variants are defined as occurring in at least five percent of the population, in contrast to more rare variants that are generally defined as occurring in less than one percent of the population. Studying or “mapping” common variations has offered significant insight into the underlying fabric of human genetics. The most common markers used in mapping these variations have been single nucleotide polymorphisms, more commonly known as SNPs. A SNP occurs when a particular, single base position is variable within a population. That is, some members of the population have the “reference allele” and other members of the population have the “alternate allele.”

Mapping the several million known SNPs that occur across a variety of populations has helped to define haplotypes, or patterns in which SNP alleles are correlated with one another such that “tag SNPs” may be chosen that are predictive of other SNPs in the haplotype. Thus, only one or a few tag SNPs need to be analyzed or “genotyped” to determine the genotypes for most or all of the SNPs in a given haplotype. In this way, identifying and genotyping tag SNPs across the entire genome greatly facilitates the process of analyzing the common genetic variations in any one individual. Collections of several hundred thousand tag SNPs form the basis for genome-wide association studies that focus on phenotypes such as predisposition to a particular disease, variability in response to specific drugs, and other traits of interest.

The effects of variation

Numerous completed genome-wide association studies have demonstrated the effectiveness of this approach in identifying statistically-robust SNP associations with disease. Perlegen’s successes with this approach include studies in myocardial infarction, breast cancer and other diseases. However, the size of the clinical effects discovered is generally small, with an elevation of the relative risk of the particular trait of interest typically in the neighborhood of 30–50 percent. Equally problematic, the statistical validation of these small effects requires that studies be quite large, requiring DNA samples from 10,000 patients or more.

Sample collections of this scale have been completed for some—but by no means all—questions of disease predisposition. In contrast, collections of this size for drug efficacy or drug toxicity are almost non-existent, stemming from a host of historical, economic and regulatory considerations.

Once we understand that common variations, while potentially significant in number across populations, are generally associated with only limited effect size in individuals, the critical question is—where might we find variations with greater individual effects? Our answer is informed by very high-effect, but very rare, single gene disorders, which illustrate how greater-effect variants can be individually rare, but taken together can be quite substantial for a clinically important phenotype.

Direct genomic resequencing of DNA from affected individuals is required to detect these rarer, greater-effect, genetic variants. This was technically impossible before the advent of next-generation high-throughput sequencing platforms. However, as these techniques have improved in accuracy and read lengths, and dropped dramatically in price, the search for rarer variants has become significantly more practical.

These developments drive a critical logistical consequence as well: with larger effect sizes grouped gene-by-gene, rather than variant-by-variant, the statistics of discovery improve remarkably, requiring merely hundreds, rather than thousands, of affected individuals. Importantly, the unit of analysis for these variations is the gene in which they occur, either in coding or regulatory regions, and not the specific associated variant. The analysis of only thousands of genes is a much more manageable undertaking than working through the multiple-comparison problems that drive false discovery when comparing millions or hundreds of thousands of SNPs.

Influence on clinical decisions

Sequencing studies powered to detect larger effect sizes among similarly-affected individuals are a necessary complement to SNP genotyping and whole-genome association studies for the discovery of clinically-relevant human genetic variation. Even with larger effect sizes, however, clinical utility will require the availability of treatment options that limit or remove the effect of a genetic predisposition. Only when specifically actionable decisions can be influenced in one direction or another by the results of a genetic inquiry is the clinical utility of a genetic diagnostic—and therefore its commercial viability—likely to be assured.

Beyond SNP genotyping

At present, Perlegen is expanding its technology base beyond SNP genotyping to allow for high-throughput, massively parallel gene sequencing. This approach captures all variations other than copy number—not just SNPs—and analyzes the results on a gene-by-gene basis, rather than on a specific variant-by-variant basis. This result is a dramatic improvement in discovery statistics, ensuring Perlegen’s continued lead in uncovering novel, clinically important genetic variations.