Use of Genomics Information in a Genetic Selection Program
Archie C. Clutter
DEKALB Choice Genetics
The National Swine Improvement Federation has served for many years to facilitate the application of quantitative genetics to practical swine breeding. Classical methods of animal breeding have been successfully applied in the pork industry to improve growth performance, body composition and litter size. But as the industry strives to produce the meat of choice in the 21st century, we look to new methods and new technologies to maximize genetic progress in the traits that will give pork a competitive advantage in the food market.
At the forefront of scientific and technological areas that hold great promise for the pork industry is genomics, the study of how individual genes affect observable traits. My objectives are to describe 1) some fundamentals of genomics and how they relate to traits that are important to the pork industry, 2) the resources required for comprehensive discovery and development of genomics information, and 3) the application of that information in genetic selection.
Fundamentals of Genomics
While many in the audience have a good understanding of the scientific fundamentals involved, it is probably most effective to start this discussion of genomics from the ground up.
Genomics is the study of how specific genes are linked to observable traits. At Monsanto/DEKALB Choice Genetics, our genomics program is aimed at genes that affect trait areas of greatest economic importance to our customers: animal efficiency, meat quality, sow prolificacy and animal health. The most basic question underlying genomics then is “What is a gene?”
Whatever the genetic message is that is passed from parent to offspring, it must be contained in the sperm and the egg. The sperm cell is essentially a nucleus with a tail, so intuitively we know that the genetic message (i.e., genes) must be contained in the nucleus of the cell. Primary structures inside the nucleus of the cell are chromosomes. In Figure 1 is a picture of the 38 chromosomes (19 pairs of chromosomes) that are found in the nucleus of a tissue cell from a pig. If we look more closely at the chromosomes, we find that they are each a long, coiled piece of deoxyribonucleic acid (DNA). The double helix structure of DNA (Figure 2) is by now very familiar to most of us.
Looking even more closely at the structure of DNA, we find a series or sequence of what are called nitrogenous bases along the double helix. The different bases are represented with the letters A, T, C and G. It is the sequence of these four bases along the DNA molecule that is the genetic message. That message is the instructions for building a specific protein in the cell. The DNA sequence is transcribed into RNA which is carried to the cytoplasm of the cell where it is translated and the protein is built.
How can simply the instructions for building proteins accomplish everything we know about the power of genetics and inheritance? How can it determine skin color or baldness, or the potential for an animal to grow efficiently or produce quality product? If you consider some of the roles of protein in the body: hormones, enzymes, pigments, structural proteins, regulators of gene expression, etc., you can start to understand the power of this message and the power of genes.
So, “What is a gene?” It is a sequence of DNA at a specific point on a chromosome that is the instructions for building a specific protein.
To appreciate the challenges associated with genomics, it is also important to understand the vastness of the genome. It is estimated that the DNA that makes up all 19 different chromosomes found in the nucleus of a single pig cell, contains in total a sequence of approximately 3 billion bases (A’s, T’s, C’s, and G’s). While not all this sequence is instructions for making protein (much of it is just taking up space), there are estimated to be several tens of thousands of genes (sets of instructions for specific proteins) among the billions of bases.
A fundamental tool of genomics is the genetic marker. A genetic marker is a detectable difference in DNA sequence that allows a specific point on the chromosome to be ‘tagged’ and followed through the generations of a family using a laboratory test. Some markers are based on a DNA sequence within a gene, but many are based on DNA sequence in between genes and simply tag an anonymous point along the chromosome. The concept of using these markers to look for important genes is then relatively simple (see Figure 3) – if enough markers are put on each of the chromosomes, wherever the genes are that we seek, they must be located nearby (linked to) a marker. By determining that a marker is associated with a difference a trait of interest, we have determined the approximate location of a gene involved (i.e., there must be a gene affecting the trait somewhere nearby that marker). Effective use of markers in this way to “scan the genome” for important genes requires a dense enough set of markers along each of the chromosomes, an appropriate family structure and enough progeny to accurately determine which markers are truly associated with a difference in phenotype. The application of this approach in the pig will be reviewed later in this paper.
Complex Quantitative Traits and Traditional Animal Breeding
Although some traits are inherited in a simple manner because they are controlled by one or two genes and are relatively unaffected by the environment (e.g., coat color), the majority of traits of great economic importance to the pork industry are much more complex. Traits like growth rate, efficiency, % lean, water-holding capacity, intramuscular fat, age of puberty, and litter size are quantitative traits, each affected by many genes with relatively small individual effects and to varying degrees by the environment.
In a population of pigs, genetic variation for one of these quantitative traits is due to differences among animals in the forms (alleles) of these many genes that they possess. Genetic improvement comes from predicting the value of the alleles that an animal possesses and selecting those that will be of greatest value as parents. With traditional animal breeding methods, we have been forced to take a “black box” approach in which the value of the alleles an animal possesses is estimated without any knowledge of the identity or location of the genes involved. Many past NSIF sessions have highlighted the now classical methods of Best Linear Unbiased Prediction (BLUP) by which phenotypic information for traits of economic importance along with the genetic relationships among the animals in the population, both those with and without records, are used to estimate the value of gene alleles possessed by an animal (i.e., its breeding value). For many traits these conventional methods are very effective. For example, for a trait like ADG that can be easily measured in both sexes before selection takes place and is moderately heritable, response per generation from traditional selection is considerable.
But for sex-limited or lowly heritable traits (e.g., litter size), traits only measured effectively in the carcass (e.g., meat quality) or traits that are simply difficult and expensive to measure (e.g., feed intake), improvement with traditional methods is more challenging. These are the types of traits for which the potential for benefits from genomics are greatest. Simply put, genomics is a way to open the black box and look inside, with the objective of using information regarding the individual gene alleles that an animal possesses to make more effective selection decisions and enhance the rate of genetic improvement.
Before we move on to the next section, here is a summary of some important terms:
· Locus is the specific location along the chromosome of a gene or marker (plural is “loci”).
· Quantitative Trait Locus (QTL) is the location of a gene affecting a quantitative trait.
· Genetic Marker is a detectable difference in DNA sequence that can be used to ‘tag’ a specific point on the chromosome.
· Linked Marker is a marker located nearby (i.e., linked to) an important gene (e.g., linked to a QTL).
· Marker-Assisted Selection is when information on linked markers is used in combination with phenotypic information to estimate breeding value and select animals to be parents.
· Recombination is the exchange of pieces between chromosomes in a pair that occurs naturally every time sperm and eggs are produced. This process can break down the relationship between a linked marker and a QTL. Consequently, only markers very closely linked to a QTL can be used effectively in marker-assisted selection.
“Who are these genes and where are they hiding?”
So if genomics is the means by which we open up the black box and identify the genes that affect traits of economic importance to our industry, let’s hurry up and open the box! Two main points of the previous sections emphasize the major challenges encountered in genomics: 1) the complexity of important quantitative traits (each trait affected by many genes with relatively small individual effects, and by the environment), and 2) the vastness of the genome (tens of thousands of genes residing in a sequence of billions of bases).
Although the challenges are significant, the technology exists for discovery of many of these genes affecting important traits and for their use in genetic selection programs. In the mid- and late-1990’s, laboratory tests were developed for anonymous markers throughout the genomes of livestock species. In Figure 4 is an example from the late 1990’s of a set of markers for chromosome 6 in the pig, developed and made available through the USDA MARC in Clay Center, NE. The availability of these markers made possible the first scans of the genome for important livestock genes and several of these scans have now been reported for the pig (e.g., Andersson et al., 1994; Bidanel et al., 2001; Casas et al., 1997; Cassady et al., 2001; Malek et al., 2001; Paszek et al., 1999).
These initial scans in livestock and other mammalian species have yielded results largely consistent with expectations based on the characteristics of quantitative traits, and have demonstrated the challenges associated with taking genomics information to the point of application:
· Several regions of the genome contain genes causing variation in a given quantitative trait, and most of these genes have small to moderate effects on the trait. For example, using a powerful half-sib (daughter) design in dairy cattle, Mosig et al. (2001) were able to detect most of the genes contributing to genetic variation in milk protein percentage. They reported that between 23 and 28 QTL on 22 different chromosomes cause nearly 100% of the genetic variation in this single trait. Using reported results from genome scans in dairy cattle and pigs, Hayes and Goddard (2001) estimated the likely distribution of QTL effects for a typical phenotypic trait. They concluded that there are likely many QTL of small effect and few of large effect that contribute to the genetic variation in production traits, and that genome scans of great power will be needed to detect all the QTL that explain a majority (> 50%) of that variation.
· Many of the important chromosomal regions revealed in genome scans do not correspond with the known locations of genes with biological functions related to the trait. This is true even in species like the mouse for which the biological function of a large number of genes is known. Although the mouse genome has been scanned several times for genes that cause variation in growth rate, most of the significant regions do not correspond with the locations of known genes with biological function related to growth and none of these scans have identified what might seem like a gene with an obvious role, the growth hormone gene, as a contributor to genetic variation (Pomp, 1997). It is important here to remember that a gene can have an important function in the biology of a trait, but not contribute to genetic variation because it is fixed in the population (i.e., has only one form) or has alleles with non-significant differences. An example of this in the pig is a scan conducted by Rohrer et al. (2001) for genes contributing to genetic variation in FSH concentration. Several chromosomal regions with significant effects on FSH concentration were detected, but they did not include the region known to contain the FSH gene. These results indicate that many of the genes that cause genetic variation in a trait, and thereby provide the raw material for genetic selection, cannot be predicted at the beginning of a gene search (e.g., they may be genes that code for proteins that regulate the expression of other genes).
· Initial scans of the genome, if designed appropriately and with adequate power, can allow the detection of chromosomal regions containing QTL with large, moderate or even small effects on important traits. But the size of the region typically detected makes the application of marker-assisted selection, let alone identification of the responsible gene(s), difficult. For example, most reported individual QTL in pigs have been mapped to regions of a chromosome that includes 20 to 40 million bases of DNA sequence and perhaps 600 to 1,200 genes. Marker-assisted selection can only be effectively applied after an additional process of verification and fine-mapping in which markers very closely linked to the QTL (causal gene) are associated with the trait(s) of interest (Figure 5).
Thus, a genomics program aimed at harvesting information on a large proportion of the individual genes affecting a quantitative trait must assume little or no prior knowledge of the identity or location of those genes and cast a wide net through powerful genome scans. Even with powerful detection, the identity of the genes causing variation in quantitative traits will remain very elusive. Fortunately, the true identity of the causal gene is not required for effective marker-assisted selection, rather what is needed is a detailed knowledge of the location of the gene. Reaching that point requires a broad and powerful genomics platform.
A Comprehensive and Powerful Genomics Platform
A genomics platform built for successful discovery and application must address three primary needs: 1) detection and mapping of a large proportion of the QTL affecting important quantitative traits through the association of markers to phenotypes, 2) fine-mapping of the detected QTL by identifying informative markers linked closely to the causal gene, and 3) verification of both the direct and correlated effects of the QTL in the target population(s) and effective use of the finely mapped QTL to increase the rate of genetic improvement realized by the customer.
Although many genome scans have now been reported in the pig, few have been based on large enough populations to result in a high probability of finding QTL with small effects (e.g., less than .25 phenotypic SD). In many cases, marker genotypes and phenotypic data are needed on several thousands of animals to achieve the desired power of detection. At Monsanto/DEKALB Choice Genetics we implement genome scans in a variety of internal populations, some specifically produced for powerful gene mapping, so that a large proportion of the QTL affecting traits of great importance to our customers and segregating (available for selection) in our nucleus lines will be discovered. This requires not only access to the appropriate populations, but also tremendous resources to assure the collection of high quality genotypes and phenotypes on a high-throughput scale. A paper by Dr. Sam Buttram in these proceedings focuses on the collection and processing of quality data at DEKALB Choice Genetics.
Tremendous resources are also needed to make the step from a detected QTL in a chromosomal region containing hundreds and hundreds of genes to a finely-mapped QTL for which the relationship between markers and QTL will be relatively unaffected by recombination, a step that is essential for effective marker-assisted selection. In addition to optimally designed families from the populations targeted for selection, and powerful statistical tools, a critical need in this step is a large number of informative markers within each region containing an important QTL. Resources at Monsanto/DEKALB Choice Genetics like the porcine BAC map (Warren et al., 2001), a collection of DNA pieces representing all the pig chromosomes, will provide quicker access to new markers in QTL regions throughout the genome. Comparative mapping, which takes advantage of similarities among the genomes of mammalian species to tap into the large amount of genomic information from the human and mouse, and transcript profiling, a method of monitoring the gene expression associated with biological processes associated with the trait, are also important parts of the Monsanto/DEKALB Choice Genetics platform that can be used effectively at this point to help generate new markers and suggest candidates for the causal gene. These processes quickly generate millions of pieces of information that must be processed and interpreted with the goal of refining the QTL map. Bioinformatics is a new scientific discipline that combines mathematics, computer science and molecular biology to deal with the massive amount of information necessary for fine-mapping of the genome. New and unique tools for the pig have been developed in the bioinformatics group at Monsanto/DEKALB Choice Genetics (Veenhuizen, 2000), but we also benefit from bioinformatics tools and previous experience that exists within other areas of the company.
Before marker-assisted selection occurs, each QTL must be evaluated in the population(s) targeted for selection. Adequate fine-mapping means that the evaluation can occur across families, but for each QTL the allele frequency, and direct and correlated effects must be determined. This process is accelerated by mapping the QTL in the same populations targeted for selection. Verified QTL are available for incorporation into the breeding program.
Application of Genomics Information – Enhancing Traditional Pig Breeding
It is important to point out that genomics is not a replacement for quantitative genetics and animal breeding, but for certain aspects of production and product quality has the potential to greatly enhance conventional breeding methods. At Monsanto/DEKALB Choice Genetics there is a solid base of product lines and quantitative animal breeding practices upon which our genomics program is built. Quantitative and molecular breeders work in concert to ensure that maximum value will be realized from use of genomics information in the breeding program.
Recall the traits for which conventional animal breeding methods encounter the greatest barriers and genomics potentially has the most to offer: sex-limited traits, lowly heritable traits, traits expressed late in life (after selection), and traits that are difficult or expensive to measure. Once QTL have been verified as described above,
These features of markers can be exploited to impact the rate of genetic improvement through greater accuracy of selection, greater intensity of selection from the testing (through genotyping) of a greater number of animals, and in some cases shorter generation interval because animals can be measured earlier in life.
The marker information can be used in a variety of ways to rank animals for selection. In the simplest cases, for example for a known undesirable mutation such as the halothane allele, marker information can be used to eradicate the allele from the population. Markers closely linked to QTL affecting complex traits can be used in combination with phenotypic information available on the individual or relatives in various forms of marker-assisted genetic evaluation. The most complete integration of marker information for complex traits into the traditional genetic evaluation is through marker-assisted BLUP (Fernando and Grossman, 1989). Marker-assisted BLUP generates the same sort of breeding values familiar to swine breeders, but with enhanced accuracy due to the incorporation of marker information.
There are many estimates of the potential value from marker-assisted selection in livestock breeding programs. Results from a simulation study by Meuwissen and Goddard (1996) are summarized in Figure 7, and reaffirm that it is for those traits most difficult to improve with conventional methods that genomics hold greatest potential value for the pork industry.
Genomics is the study of how specific genes are linked to observable traits. For traits that are difficult to improve with conventional swine breeding methods, genomics offers significant potential value. But a comprehensive and resourceful genomics platform is required to overcome the great challenges associated with discovering QTL that explain a large proportion of the genetic variation in important complex traits, fine-mapping those QTL to allow for effective marker-assisted selection, and finally verifying and capturing their value in industry populations.
Andersson, L., C.S. Haley, H. Ellegren, S.A. Knott, M. Johansson, K. Andersson, L. Andersson-Eklund, I. Edfors-Lilja, M. Fredholm, I. Hansson, J. Hakansson and K. Lundstrom. 1994. Genetic mapping of quantitative trait loci for growth and fatness in pigs. Science 263:1771-1774.
Bidanel, J., D. Milan, N. Iannuccelli, Y. Amigues, M. Boscher, F. Bourgeois, J. Caritez, J. Gruand, P. Le Roy, H. Lagant, R. Quintanilla, C. Renard, J. Gellin, L. Ollivier and C. Chevalet. 2001. Detection of quantitative trait loci for growth and fatness in pigs. Genet. Sel. Evol. 33:289-309.
Cassady, J.P., R.K. Johnson, D. Pomp, G.A. Rohrer, L.D. Van Vleck, E.K. Spiegel and K.M. Gilson. 2001. Identification of quantitative trait loci affecting reproduction in pigs. J. Anim. Sci. 79:623-633.
Casas-Carrillo, E., A. Prill-Adams, S.G. Price, A.C. Clutter and B.W. Kirkpatrick. 1997. Mapping genomic regions associated with growth rate in pigs. J. Anim. Sci. 75:2047-2053.
Fernando, R.L., and M. Grossman. 1989. Marker assisted selection using best linear unbiased prediction. Gen. Sel. Evol. 21:467-477.
Hayes, B. and M. E. Goddard. 2001. The distribution of the effects of genes affecting quantitative traits in livestock. Genet. Sel. Evol. 33: 209-229.
Malek, M., J. C. M. Dekkers, H. K. Lee, T. J. Baas, K. Prusa, E. Huff-Lonergan and M. F. Rothschild. 2001. A molecular genome scan analysis to identify chromosomal regions influencing economic traits in the pig. II. Meat and muscle composition. Mamm. Genome 12: 637-645.
Meuwissen, T.H.E., and M.E. Goddard. 1996. The use of marker haplotypes in animal breeding schemes. Genet. Sel. Evol. 28:161-176.
Mosig, M.O., E. Lipkin, G. Khutoreskaya, E. Tchourzyna, M. Soller and A. Friedmann. 2001. A whole genome scan for quantitative trait loci affecting milk protein percentage in Israeli-Holstein cattle, by means of selective milk DNA pooling in a daughter design, using an adjusted false discovery rate criterion. Genetics 157: 1683-1698.
Pomp, D. 1997. Genetic dissection of obesity in polygenic animal models. Behavior Genetics 27: 285-306.
Rohrer, G. A., T. H. Wise, D. D. Lunstra and J. J. Ford. 2001. Identification of genomic regions controlling plasma FSH concentrations in Meishan-White composite boars. Physiol. Genomics 6: 145-151.
Veenhuizen, J., 2000. Bioinformatics and swine genetic improvement. Pages 84-86. Proceedings of the Annual Conference of the National Swine Improvement Federation, Nashville, TN.
Warren, W., N. Tao, B. Barbazuk, T. Allison, T. Landwe, H. Chou, R. Kaegy, M. Maloney and S. Hall. 2001. A BAC-based physical map of the sus scrofa genome. P589. Proceedings of the IX Plant and Animal Genome International Conference, San Diego, CA.