Use of Genomics Information in a Genetic Selection
Program
Archie
C. Clutter
DEKALB Choice Genetics
Monsanto
Company
Introduction
The National
Swine Improvement Federation has served for many years to facilitate the application
of quantitative genetics to practical swine breeding. Classical methods of
animal breeding have been successfully applied in the pork industry to improve
growth performance, body composition and litter size. But as the industry
strives to produce the meat of choice in the 21st century, we look
to new methods and new technologies to maximize genetic progress in the traits
that will give pork a competitive advantage in the food market.
At the forefront of scientific and technological
areas that hold great promise for the pork industry is genomics, the study of
how individual genes affect observable traits. My objectives are to describe 1)
some fundamentals of genomics and how they relate to traits that are important
to the pork industry, 2) the resources required for comprehensive discovery and
development of genomics information, and 3) the application of that information
in genetic selection.
Fundamentals of Genomics
While many in
the audience have a good understanding of the scientific fundamentals involved,
it is probably most effective to start this discussion of genomics from the
ground up.
Genomics is the study of how
specific genes are linked to observable traits. At Monsanto/DEKALB Choice
Genetics, our genomics program is aimed at genes that affect trait areas of
greatest economic importance to our customers: animal efficiency, meat quality,
sow prolificacy and animal health. The most basic question underlying genomics
then is “What is a gene?”
Whatever the genetic message is that is passed from parent to
offspring, it must be contained in the sperm and the egg. The sperm cell is
essentially a nucleus with a tail, so intuitively we know that the genetic
message (i.e., genes) must be contained in the nucleus of the cell. Primary
structures inside the nucleus of the cell are chromosomes. In Figure 1 is a
picture of the 38 chromosomes (19 pairs of chromosomes) that are found in the
nucleus of a tissue cell from a pig. If we look more closely at the
chromosomes, we find that they are each a long, coiled piece of
deoxyribonucleic acid (DNA). The double helix structure of DNA (Figure 2) is by
now very familiar to most of us.
Figure 1. 38 chromosomes (19 pair) present in diploid cells of the pig
Figure 2. Structure of deoxyribonucleic acid (DNA)
Looking even
more closely at the structure of DNA, we find a series or sequence of what are
called nitrogenous bases along the double helix. The different bases are
represented with the letters A, T, C and G. It is the sequence of these four bases along the DNA molecule that is
the genetic message. That message is the instructions for building a
specific protein in the cell. The DNA sequence is transcribed into RNA which is
carried to the cytoplasm of the cell where it is translated and the protein is
built.
How
can simply the instructions for building proteins accomplish everything we know
about the power of genetics and inheritance? How can it determine skin color or
baldness, or the potential for an animal to grow efficiently or produce quality
product? If you consider some of the roles of protein in the body: hormones,
enzymes, pigments, structural proteins, regulators of gene expression, etc.,
you can start to understand the power of this message and the power of genes.
So, “What is a gene?” It is a sequence of DNA
at a specific point on a chromosome that is the instructions for building a
specific protein.
To appreciate
the challenges associated with genomics, it is also important to understand the
vastness of the genome. It is estimated that the DNA that makes up all 19
different chromosomes found in the nucleus of a single pig cell, contains in
total a sequence of approximately 3 billion bases (A’s, T’s, C’s, and G’s).
While not all this sequence is instructions for making protein (much of it is
just taking up space), there are estimated to be several tens of thousands of
genes (sets of instructions for specific proteins) among the billions of bases.
A fundamental
tool of genomics is the genetic marker. A genetic marker is a detectable
difference in DNA sequence that allows a specific point on the chromosome to be
‘tagged’ and followed through the generations of a family using a laboratory
test. Some markers are based on a DNA sequence within a gene, but many are
based on DNA sequence in between genes and simply tag an anonymous point along
the chromosome. The concept of using these markers to look for important genes
is then relatively simple (see Figure 3) – if enough markers are put on each of
the chromosomes, wherever the genes are that we seek, they must be located
nearby (linked to) a marker. By determining that a marker is associated with a
difference a trait of interest, we have determined the approximate location of
a gene involved (i.e., there must be a gene affecting the trait somewhere
nearby that marker). Effective use of markers in this way to “scan the genome”
for important genes requires a dense enough set of markers along each of the
chromosomes, an appropriate family structure and enough progeny to accurately
determine which markers are truly associated with a difference in phenotype.
The application of this approach in the pig will be reviewed later in this
paper.
Figure 3. Concept of using markers along each of the chromosomes to detect QTL.
Complex Quantitative Traits and Traditional
Animal Breeding
Although some
traits are inherited in a simple manner because they are controlled by one or
two genes and are relatively unaffected by the environment (e.g., coat color),
the majority of traits of great economic importance to the pork industry are
much more complex. Traits like growth rate, efficiency, % lean, water-holding
capacity, intramuscular fat, age of puberty, and litter size are quantitative
traits, each affected by many genes with relatively small individual effects
and to varying degrees by the environment.
In a population of pigs, genetic variation for one
of these quantitative traits is due to differences among animals in the forms
(alleles) of these many genes that they possess. Genetic improvement comes from
predicting the value of the alleles that an animal possesses and selecting
those that will be of greatest value as parents. With traditional animal
breeding methods, we have been forced to take a “black box” approach in which
the value of the alleles an animal possesses is estimated without any knowledge
of the identity or location of the genes involved. Many past NSIF sessions have
highlighted the now classical methods of Best Linear Unbiased Prediction (BLUP)
by which phenotypic information for traits of economic importance along with
the genetic relationships among the animals in the population, both those with
and without records, are used to estimate the value of gene alleles possessed
by an animal (i.e., its breeding value). For many traits these conventional
methods are very effective. For example, for a trait like ADG that can be
easily measured in both sexes before selection takes place and is moderately
heritable, response per generation from traditional selection is considerable.
But
for sex-limited or lowly heritable traits (e.g., litter size), traits only
measured effectively in the carcass (e.g., meat quality) or traits that are
simply difficult and expensive to measure (e.g., feed intake), improvement with
traditional methods is more challenging. These are the types of traits for
which the potential for benefits from genomics are greatest. Simply put,
genomics is a way to open the black box and look inside, with the objective of
using information regarding the individual gene alleles that an animal
possesses to make more effective selection decisions and enhance the rate of
genetic improvement.
Before we
move on to the next section, here is a summary of some important terms:
·
Locus
is the specific location along the chromosome of a gene or marker (plural is
“loci”).
·
Quantitative Trait Locus (QTL) is the location of a gene affecting a quantitative
trait.
·
Genetic Marker is a detectable difference in DNA sequence that can be used to ‘tag’ a
specific point on the chromosome.
·
Linked Marker is a marker located nearby (i.e., linked to) an important gene (e.g.,
linked to a QTL).
·
Marker-Assisted Selection is when information on linked markers is used in
combination with phenotypic information to estimate breeding value and select
animals to be parents.
·
Recombination is the exchange of pieces between chromosomes in a pair that occurs
naturally every time sperm and eggs are produced. This process can break down
the relationship between a linked marker and a QTL. Consequently, only markers
very closely linked to a QTL can be used effectively in marker-assisted
selection.
“Who are these genes and
where are they hiding?”
So
if genomics is the means by which we open up the black box and identify the
genes that affect traits of economic importance to our industry, let’s hurry up
and open the box! Two main points of the previous sections emphasize the major
challenges encountered in genomics: 1) the complexity of important quantitative
traits (each trait affected by many genes with relatively small individual
effects, and by the environment), and 2) the vastness of the genome (tens of
thousands of genes residing in a sequence of billions of bases).
Although the challenges are
significant, the technology exists for discovery of many of these genes
affecting important traits and for their use in genetic selection programs. In
the mid- and late-1990’s, laboratory tests were developed for anonymous markers
throughout the genomes of livestock species. In Figure 4 is an example from the
late 1990’s of a set of markers for chromosome 6 in the pig, developed and made
available through the USDA MARC in Clay Center, NE. The availability of these
markers made possible the first scans of the genome for important livestock
genes and several of these scans have now been reported for the pig (e.g.,
Andersson et al., 1994; Bidanel et al., 2001; Casas et al., 1997; Cassady et
al., 2001; Malek et al., 2001; Paszek et al., 1999).
Figure 4. USDA linkage map for porcine chromosome 6.
These
initial scans in livestock and other mammalian species have yielded results
largely consistent with expectations based on the characteristics of
quantitative traits, and have demonstrated the challenges associated with
taking genomics information to the point of application:
·
Several
regions of the genome contain genes causing variation in a given quantitative
trait, and most of these genes have small to moderate effects on the trait. For
example, using a powerful half-sib (daughter) design in dairy cattle, Mosig et
al. (2001) were able to detect most of the genes contributing to genetic
variation in milk protein percentage. They reported that between 23 and 28 QTL
on 22 different chromosomes cause nearly 100% of the genetic variation in this
single trait. Using reported results from genome scans in dairy cattle and
pigs, Hayes and Goddard (2001) estimated the likely distribution of QTL effects
for a typical phenotypic trait. They concluded that there are likely many QTL
of small effect and few of large effect that contribute to the genetic
variation in production traits, and that genome scans of great power will be
needed to detect all the QTL that explain a majority (> 50%) of that
variation.
·
Many
of the important chromosomal regions revealed in genome scans do not correspond
with the known locations of genes with biological functions related to the
trait. This is true even in species like the mouse for which the biological
function of a large number of genes is known. Although the mouse genome has
been scanned several times for genes that cause variation in growth rate, most
of the significant regions do not correspond with the locations of known genes
with biological function related to growth and none of these scans have
identified what might seem like a gene with an obvious role, the growth hormone
gene, as a contributor to genetic variation (Pomp, 1997). It is important here
to remember that a gene can have an important function in the biology of a
trait, but not contribute to genetic variation because it is fixed in the
population (i.e., has only one form) or has alleles with non-significant
differences. An example of this in the pig is a scan conducted by Rohrer et al.
(2001) for genes contributing to genetic variation in FSH concentration.
Several chromosomal regions with significant effects on FSH concentration were
detected, but they did not include the region known to contain the FSH gene.
These results indicate that many of the genes that cause genetic variation in a
trait, and thereby provide the raw material for genetic selection, cannot be
predicted at the beginning of a gene search (e.g., they may be genes that code
for proteins that regulate the expression of other genes).
·
Initial
scans of the genome, if designed appropriately and with adequate power, can
allow the detection of chromosomal regions containing QTL with large, moderate
or even small effects on important traits. But the size of the region typically
detected makes the application of marker-assisted selection, let alone
identification of the responsible gene(s), difficult. For example, most
reported individual QTL in pigs have been mapped to regions of a chromosome
that includes 20 to 40 million bases of DNA sequence and perhaps 600 to 1,200
genes. Marker-assisted selection can only be effectively applied after an
additional process of verification and fine-mapping in which markers very
closely linked to the QTL (causal gene) are associated with the trait(s) of
interest (Figure 5).
Figure 5. Concept of fine-mapping a detected QTL
Thus, a genomics program aimed at harvesting
information on a large proportion of the individual genes affecting a
quantitative trait must assume little or no prior knowledge of the identity or
location of those genes and cast a wide net through powerful genome scans. Even
with powerful detection, the identity of the genes causing variation in
quantitative traits will remain very elusive. Fortunately, the true identity of
the causal gene is not required for effective marker-assisted selection, rather
what is needed is a detailed knowledge of the location of the gene. Reaching
that point requires a broad and powerful genomics platform.
A Comprehensive and Powerful
Genomics Platform
A
genomics platform built for successful discovery and application must address
three primary needs: 1) detection and mapping of a large proportion of the QTL
affecting important quantitative traits through the association of markers to
phenotypes, 2) fine-mapping of the detected QTL by identifying informative
markers linked closely to the causal gene, and 3) verification of both the
direct and correlated effects of the QTL in the target population(s) and
effective use of the finely mapped QTL to increase the rate of genetic improvement
realized by the customer.
Figure 6. Core technologies of the Monsanto/DEKALB Choice Genetics Genomics Platform
Although
many genome scans have now been reported in the pig, few have been based on
large enough populations to result in a high probability of finding QTL with
small effects (e.g., less than .25 phenotypic SD). In many cases, marker
genotypes and phenotypic data are needed on several thousands of animals to
achieve the desired power of detection. At Monsanto/DEKALB Choice Genetics we
implement genome scans in a variety of internal populations, some specifically
produced for powerful gene mapping, so that a large proportion of the QTL
affecting traits of great importance to our customers and segregating
(available for selection) in our nucleus lines will be discovered. This
requires not only access to the appropriate populations, but also tremendous
resources to assure the collection of high quality genotypes and phenotypes on
a high-throughput scale. A paper by Dr. Sam Buttram in these proceedings
focuses on the collection and processing of quality data at DEKALB Choice
Genetics.
Tremendous resources are also
needed to make the step from a detected QTL in a chromosomal region containing
hundreds and hundreds of genes to a finely-mapped QTL for which the
relationship between markers and QTL will be relatively unaffected by
recombination, a step that is essential for effective marker-assisted
selection. In addition to optimally designed families from the populations targeted
for selection, and powerful statistical tools, a critical need in this step is
a large number of informative markers within each region containing an
important QTL. Resources at Monsanto/DEKALB Choice Genetics like the porcine
BAC map (Warren et al., 2001), a collection of DNA pieces representing all the
pig chromosomes, will provide quicker access to new markers in QTL regions
throughout the genome. Comparative mapping, which takes advantage of
similarities among the genomes of mammalian species to tap into the large
amount of genomic information from the human and mouse, and transcript
profiling, a method of monitoring the gene expression associated with
biological processes associated with the trait, are also important parts of the
Monsanto/DEKALB Choice Genetics platform that can be used effectively at this
point to help generate new markers and suggest candidates for the causal gene.
These processes quickly generate millions of pieces of information that must be
processed and interpreted with the goal of refining the QTL map. Bioinformatics
is a new scientific discipline that combines mathematics, computer science and
molecular biology to deal with the massive amount of information necessary for
fine-mapping of the genome. New and unique tools for the pig have been
developed in the bioinformatics group at Monsanto/DEKALB Choice Genetics
(Veenhuizen, 2000), but we also benefit from bioinformatics tools and previous
experience that exists within other areas of the company.
Before marker-assisted
selection occurs, each QTL must be evaluated in the population(s) targeted for
selection. Adequate fine-mapping means that the evaluation can occur across
families, but for each QTL the allele frequency, and direct and correlated
effects must be determined. This process is accelerated by mapping the QTL in
the same populations targeted for selection. Verified QTL are available for
incorporation into the breeding program.
Application of Genomics Information – Enhancing
Traditional Pig Breeding
It
is important to point out that genomics is not a replacement for quantitative
genetics and animal breeding, but for certain aspects of production and product
quality has the potential to greatly enhance conventional breeding methods. At
Monsanto/DEKALB Choice Genetics there is a solid base of product lines and
quantitative animal breeding practices upon which our genomics program is
built. Quantitative and molecular breeders work in concert to ensure that
maximum value will be realized from use of genomics information in the breeding
program.
Recall
the traits for which conventional animal breeding methods encounter the
greatest barriers and genomics potentially has the most to offer: sex-limited
traits, lowly heritable traits, traits expressed late in life (after selection),
and traits that are difficult or expensive to measure. Once QTL have been
verified as described above,
- markers
for sex limited traits can be used to genotype both males and females,
- markers
for lowly heritable traits can provide ‘environment free’ information
regarding genetic merit,
- markers
for traits expressed later in life are measurable at the earliest stages
(e.g., potentially in the embryo) and those for meat quality traits can be
measured directly in the live animal (breeding candidate) and,
- markers
for traits that are difficult or expensive to measure can reduce the need
for these phenotypic measurements.
These features of markers
can be exploited to impact the rate of genetic improvement through greater
accuracy of selection, greater intensity of selection from the testing (through
genotyping) of a greater number of animals, and in some cases shorter
generation interval because animals can be measured earlier in life.
The marker information
can be used in a variety of ways to rank animals for selection. In the simplest
cases, for example for a known undesirable mutation such as the halothane
allele, marker information can be used to eradicate the allele from the
population. Markers closely linked to QTL affecting complex traits can be used
in combination with phenotypic information available on the individual or
relatives in various forms of marker-assisted genetic evaluation. The most
complete integration of marker information for complex traits into the
traditional genetic evaluation is through marker-assisted BLUP (Fernando and
Grossman, 1989). Marker-assisted BLUP generates the same sort of breeding
values familiar to swine breeders, but with enhanced accuracy due to the incorporation
of marker information.
There are many estimates of
the potential value from marker-assisted selection in livestock breeding
programs. Results from a simulation study by Meuwissen and Goddard (1996) are
summarized in Figure 7, and reaffirm that it is for those traits most difficult
to improve with conventional methods that genomics hold greatest potential
value for the pork industry.
Figure 7. Predicted additional response (%) for marker-assisted selection
Conclusions
Genomics is the study of how specific genes are
linked to observable traits. For traits that are difficult to improve with
conventional swine breeding methods, genomics offers significant potential
value. But a comprehensive and resourceful genomics platform is required to
overcome the great challenges associated with discovering QTL that explain a
large proportion of the genetic variation in important complex traits,
fine-mapping those QTL to allow for effective marker-assisted selection, and
finally verifying and capturing their value in industry populations.
References
Andersson, L., C.S. Haley, H. Ellegren, S.A. Knott,
M. Johansson, K. Andersson, L. Andersson-Eklund, I. Edfors-Lilja, M. Fredholm,
I. Hansson, J. Hakansson and K. Lundstrom. 1994. Genetic mapping of
quantitative trait loci for growth and fatness in pigs. Science 263:1771-1774.
Bidanel, J., D. Milan, N. Iannuccelli, Y. Amigues,
M. Boscher, F. Bourgeois, J. Caritez, J. Gruand, P. Le Roy, H. Lagant, R.
Quintanilla, C. Renard, J. Gellin, L. Ollivier and C. Chevalet. 2001. Detection
of quantitative trait loci for growth and fatness in pigs. Genet. Sel. Evol.
33:289-309.
Cassady, J.P., R.K. Johnson, D. Pomp, G.A. Rohrer,
L.D. Van Vleck, E.K. Spiegel and K.M. Gilson. 2001. Identification of
quantitative trait loci affecting reproduction in pigs. J. Anim. Sci.
79:623-633.
Casas-Carrillo, E., A. Prill-Adams, S.G. Price, A.C.
Clutter and B.W. Kirkpatrick. 1997. Mapping genomic regions associated with
growth rate in pigs. J. Anim. Sci. 75:2047-2053.
Fernando, R.L., and M. Grossman. 1989. Marker
assisted selection using best linear unbiased prediction. Gen. Sel. Evol.
21:467-477.
Hayes, B. and M. E. Goddard. 2001. The distribution
of the effects of genes affecting quantitative traits in livestock. Genet. Sel.
Evol. 33: 209-229.
Malek, M., J. C. M. Dekkers, H. K. Lee, T. J. Baas,
K. Prusa, E. Huff-Lonergan and M. F. Rothschild. 2001. A molecular genome scan
analysis to identify chromosomal regions influencing economic traits in the
pig. II. Meat and muscle composition. Mamm. Genome 12: 637-645.
Meuwissen, T.H.E., and M.E. Goddard. 1996. The use
of marker haplotypes in animal breeding schemes. Genet. Sel. Evol. 28:161-176.
Mosig, M.O., E. Lipkin, G. Khutoreskaya, E.
Tchourzyna, M. Soller and A. Friedmann. 2001. A whole genome scan for
quantitative trait loci affecting milk protein percentage in Israeli-Holstein
cattle, by means of selective milk DNA pooling in a daughter design, using an
adjusted false discovery rate criterion. Genetics 157: 1683-1698.
Pomp, D. 1997. Genetic dissection of obesity in
polygenic animal models. Behavior Genetics 27: 285-306.
Rohrer, G. A., T. H. Wise, D. D. Lunstra and J. J. Ford.
2001. Identification of genomic regions controlling plasma FSH concentrations
in Meishan-White composite boars. Physiol. Genomics 6: 145-151.
Veenhuizen, J., 2000. Bioinformatics and swine
genetic improvement. Pages 84-86. Proceedings of the Annual Conference of the
National Swine Improvement Federation, Nashville, TN.
Warren, W., N. Tao, B. Barbazuk, T. Allison, T.
Landwe, H. Chou, R. Kaegy, M. Maloney and S. Hall. 2001. A BAC-based physical
map of the sus scrofa genome. P589. Proceedings of the IX Plant and Animal
Genome International Conference, San Diego, CA.