Management and Use of Commercial Data for Genetic Improvement


Scott Newman


Franklin, KY




Genetic evaluation is accomplished through computer services processing pedigree information and performance records for one or more traits. The process varies from species to species. For example, beef breeders systematically record and submit their records to a breed association. Pig breeding companies record and process their own records, but the analysis and evaluation process is similar.


The choice of traits to measure and record by a group of breeders (or a breeding company) involves some collective judgment about what traits are important and what traits can be routinely measured in the breeders’ (breeding) operations. Modern genetic evaluation procedures usually involve the statistical methodology termed BLUP (Best Linear Unbiased Prediction) and utilizes complete relationship records for several generations whenever possible. These procedures use so-called “individual animal” models to simultaneously evaluate progeny, parents and ancestors, including those without direct records of performance. Evaluation may be either on a single- or multi-trait basis. Use of the latter is increasing, assuming computational speed and storage is accessible. However, it constitutes a prerequisite for the use of crossbred information.


Most genetic evaluation that occurs in livestock is based chiefly on purebred (or at least straightbred) populations. There have been an enormous number of experiments in livestock characterizing breed and breed crosses for additive and nonadditive genetic effects all over the world. It is only natural to assume that methodologies for genetic evaluation of purebreeding populations could be extended to accommodate crossbred information, including those from composite populations. However, this is not always the case. In a pig-breeding pyramid, control of the number of crosses (or percentage combinations of genotypes) is under greater control than in beef cattle production, where a greater number of commercial records are generated from crossbred matings of varying proportions. Because of this the number of parameters to estimate for use in crossbred genetic evaluation can be somewhat unwieldy.


This paper discusses some of the issues involved in the collection, analysis and interpretation of crossbred data in genetic improvement. Discussion is also provided on the impact that crossbred information has on breeding population structure and response.


Why use crossbred information?


At best, genetic evaluation at the genetic nucleus (GN) level is a commercial line evaluation, rather than an evaluation of the crossbred (commercially marketed) product. The reasons for collecting commercial information for use in genetic evaluation include reduction in the effects of genotype x environment interaction, performance-test specialization (measuring genetic potential vs. measuring environmental sensitivity; see below), and for commercial genetic marker evaluation. This information can also be valuable for developing customized breeding programs for individual customers.


In general, the addition of crossbred information helps to redefine the role performance testing has in a genetic improvement framework. Performance-testing information collected at the GN level of a breeding pyramid is primarily concerned with helping to reveal genetic potential. The test environment at the GN is standardized to allow maximum expression of potential genetic improvement through uniform nutrition, health, etc. The addition of crossbred information aids in expanding the breadth of what the performance test information (environmental sensitivity) can provide in terms of accounting for non-genetic effects (so-called metabolic load) that can suppress the expression of genetic potential in performing animals. The test environment(s) can thus provide the opportunity to measure performance under varying levels of metabolic load.


Crossbred information also aids in increasing the accuracy of genetic evaluation through the addition of crossbred relatives’ records. Increasing family information is important in evaluation of traits of low heritability, i.e., traits of high environmental influence, traits that are expensive or difficult to measure, as well as sex-limited traits. Examples include processing quality, meat quality, robustness and survival. Crossbred information also facilitates the elucidation of genetic variation in different environments. As will be discussed later, the addition of crossbred records can also increase inbreeding, thus the addition of these records to influence genetic gain needs to be balanced with opportunities for minimizing progeny inbreeding.


Collection and management of crossbred information


The collection and use of crossbred information can have an effect on the number of available selection candidates (Bijma et al., 2001), depending upon whether the crossbred information collected is part of the GN breeding program, or is collected from the commercial population. Many breeding organizations do not have direct access to crossbred information, but do have the capability of contracting out the collection of commercial data that is then integrated into their information systems for further processing. This requires a very precise degree of coordination to accommodate:


·        Use of the same sire for repeat services;

·        Recording of sow fallout;

·        Unique identification of “test” females;

·        Tagging of test pigs;

·        Accurate recording of pig movement to nursery and finisher;

·        Accurate recording of survival;

·        Harvest date coordination and data collection at the processing plant;

·        Expedient entry of data into a common database.


The degree of complexity of the data-recording scheme is a function of the traits to be collected as well as the structure and function of the test facility. The facility may be a multiplier or experimental farm closely associated with GN operations; a farm where other experiments or special procedures have been implemented in the past. In this regard the superimposition of new data recording schemes is allowed for in standard operating procedures (SOPs). However, most data collection of crossbred information will often take place at commercial farms with relatively highly structured SOPs, and where training will be required to implement new procedures. For example, many commercial-level facilities will not record ongoing farrowing records. The key is to try and balance the need for new data collection capabilities with the daily achievement of production goals for the farm.


New traits that can be considered in the collection of crossbred information include meat quality, processing quality, performance under different diets, birth weight, survival (mortality), and robustness. The least drain on system resources would be to collect abattoir information on commercial pigs. This would require identification of the piglets at birth (combined with GN sire identification), and the ability to uniquely identify the carcass at processing. This will allow the collection of any carcass or meat quality data required, and will also yield an age (by knowledge of birth date and market date). To collect mortality data would then require the collection of tags from dead animals and a way to record the date of removal as well as reason for removal (which is often more difficult to collect).


Crossbred information in genetic evaluation[1]


Multi-line (or -breed) evaluations allow the comparison of animals of different lines utilizing information from pooled datasets, or datasets from single lines that include measurements on animals of various lines and line compositions. Multi-line evaluations have the capacity to include evaluation of crossbred progeny by allowing estimation of breed and heterosis effects (given wise definition of contemporary groups), and incorporate comparisons between purebred and proportional individuals in the same within-line genetic evaluation. As an aside, in beef cattle multi-breed genetic evaluation also allows the combining of composite breeds derived from established breeds to increase the accuracy of evaluation by utilizing more relative’s information (e.g. combining Simmental and Simbrah, Angus and Brangus, Hereford and Braford).


In general, the ultimate use of crossbred information in genetic evaluation of pigs is to provide more consistent ranking of GN sires when information from crossbred progeny raised in commercial environments is used to increase the accuracy of the resulting Estimated Breeding Values (EBV). The assumption is made that the performance of crossbred progeny from the selected sire is predictable based on information from his purebred progeny (and purebred collateral relatives), i.e. the correlation between


purebred and crossbred performance is close to unity. This assumption does not hold for all traits of economic importance.


Purebred – crossbred correlations


Purebred-Crossbred genetic correlations (rpc) can be used to predict selection response in crossbreds based on pure-line selection (Wei et al., 1991). These correlations not only reflect dominance levels but also gene frequency differences between populations. Both Pirchner and Mergl (1977) and Wei et al. (1991) have shown that purebred-crossbreed correlations become negative for some cases of overdominance. In general, rpc decreases with increasing dominance level or gene frequency difference between parental populations. For long-term selection the value of rpc will depend on dominance and selection method. In the case of overdominance, crossbred selection is to be preferred due to a higher selection limit (Hill 1970).


Purebred-crossbred genetic correlations also provide some clues to implementation of breeding strategies. With high rpc, crossbred performance can be improved more efficiently as a correlated response to purebred selection because the crossbred response is likely dependent on genetic covariances between additive effects in purebreds and crossbreds (Wei and Van der Steen, 1991). Alternatively, low and negative rpc indicates the use of reciprocal recurrent selection schemes, especially when rpc are negative.


There are a number of estimates of rpc in the literature from most species of economic importance. Brandt and Täubert (1998) summarized rpc estimates for a number of traits in pigs. Daily gain estimates summarized from the literature ranged from 0.19 to 0.73 (estimates from their study ranged from 0.47 to 0.99). Estimates of backfat ranged from 0.21 to 0.88 (0.54 to 1.00). Recently, Lutaaya et al (2001) estimated rpc for lifetime daily gain (0.62 to 0.99) and backfat (0.32 to 0.70) using Bayesian methods and accounting for dominance. In poultry egg production traits, Wei and Van der Werf (1995) and Besbes and Gibson (1999) reported estimates of rpc that ranged from 0.56 to 0.73, and from 0.80 to 0.94, respectively. In cattle, Newman et al (2002) reported rpc of 0.48, 0.48, 0.83, 0.95, 1.00, and 0.78 for 400-day weight, carcass weight, retail yield, marbling, P8 fat depth, and scanned eye muscle area, respectively.


Genetic evaluation in pigs


The complexities encountered in the use of crossbred information in cattle evaluations are not as serious in pigs. There is greater control of the number of unique genotypes produced; environmental variation is under greater control, and larger family (pedigreed) groups can be generated. As with both species, however, the focus is on the use of purebred and crossbred information to evaluate purebreds more accurately for better prediction of crossbred performance (Lutaaya et al., 2002).


A number of papers have been written laying out the foundation for the use of crossbred information in genetic evaluation as well as prediction of genetic improvement in livestock breeding schemes (e.g., Wei and Van der Werf, 1994; Bijma and Van Arendonk, 1998; Spilke et al., 1998; Lutaaya et al., 2002).


Lutaaya et al (2002) studied the gains in reliability of joint purebred-crossbred evaluations for lifetime daily gain and backfat. They also looked at re-ranking issues due to the addition of crossbred information. Their work was based on results from Lo et al., (1995, 1997), who developed a model that accounted for all additive and dominance (co)variances among all crosses of two pure lines. As with other work on the development of genetic evaluation incorporating crossbred individuals (Arnold et al., 1992), the number of parameters to be estimated can be large and deem the analysis impractical. Lutaaya et al used a simplified model in which the only cross allowed was the terminal (F1). Three models were developed: (1) a purebred model; (2) a crossbred model (incorporating additive effects in the crossbred transmitted by each parent line; and (3) an “approximate” model which was the same as (1) but with the genetic parameters as in the largest pureline (B in this case, with 24170 records, as compared to A with 6022 and the crossbred genotype C with 6135). Reliability was estimated as



where  is reliability for animal i and breed/trait j, the prediction error variance, and the additive variance.


Reliabilities for breed A (0.02 to 0.03) and breed B (0.01) were small. The number of crossbred records was about 25% of the magnitude of the largest pure breed (B), but similar to breed A. It would be expected that reliabilities in the purebreds would be greater if the number of crossbred records was greater. Reliabilities for crossbred breeding values were reversed, where improvement for the breeding values originating from line B (0.19 to 0.21) was greater than the improvement originating from line A (0.05 to 0.11). This was because males from line A could be relatively well evaluated using crossbred information only (line A was used predominantly as a paternal line) but females from line B could not unless records from line B were available.


Rank correlations between breeding values from models (1) and (2) were close to unity for Line A and B, but lower for the crossbred line C (0.869 for lifetime daily gain and 0.854 for backfat). Two important points here are that (1) when crossbred information makes up a small portion of the whole data set, then their influence on accuracy will be small; and (2) higher re-ranking will occur based on crossbred breeding values when a substantial portion of the information used in the genetic evaluation was derived from purebred information.


It was interesting to note that the approximate model generated rank correlations between breeding values from models (2) and (3) greater than 0.96. This shows that the approximate model might be sufficiently accurate to evaluate purebred and crossbred information. As Lutaaya et al rightly point out, this will hold only when the number of crossbred records is small. Thus, the approximate model will be of use in cases when increases in accuracy for purebred animals are required and the number of crossbred observations are small.


One of the assumptions made in this approximate analysis was that the purebred-crossbred correlation for each trait was unity. This assumption may not be valid for a number of traits of economic importance. When correlations are low, both purebred and crossbred evaluations would be of interest, along with larger numbers of crossbred records. The assumption of unity correlations may also have the effect of inflating crossbred breeding values generated from the approximate model.


There are situations where some traits are recorded only on crossbreds (e.g., postweaning survival, meat quality), where the model used by Lutaaya et al. (2002) would be of use.


As Lutaaya et al point out; the terminal model may be of limited use in pig breeding given that commercial animals may be composed of up to five different breeds or lines. There is also the challenge of effectively collecting crossbred information at the various levels of the breeding pyramid, where the breeding company may not have control of crossing and data collection. Alternative analyses might incorporate ideas developed for multi-breed evaluations in cattle (Klei et al., 1996), especially in the definition and use of genetic groups (Wolf et al., 2002).


Design and implementation of breeding programs


Effective genetic improvement programs benefit from increases in the amount of information used in evaluating selection candidates, as well as tools to aid in the maximization of genetic progress while balancing other tactical and strategic issues such as inbreeding and maintenance of genetic diversity. While crossbred information can have important effects on accuracy, their use must also be observed in terms of overall breeding program design.


As has been stated previously, when selection is done at the GN level, differences are often observed between performance at the GN and what is expected at the commercial level. To address this issue, a selection index can be constructed that includes information on performance of crossbred relatives. However, an index based on more information from relatives promotes increased rates of inbreeding (and concomitant loss of genetic variation, fitness and performance). An optimal solution to this problem is to use an optimized selection index that includes information from crossbreds but maximizes genetic gain while restricting rate of inbreeding.


Kremer et al (2001) presented results from deterministic simulations of a selection index that included information on performance of crossbred relatives. They addressed the issues of balancing genetic gain with rate of inbreeding by optimizing the weightings put on the different sources of information in the selection index. In the base situation the genetic gain (DG) was considered at a fixed rate of inbreeding of 2%, heritability of 0.3, and rpc = 0.4. Twenty sires per year were used on 200 GN dams. The same sires were also used to produce crossbred progeny from 200 commercial dams. Both GN and commercial dams produced 16 progeny/dam/year. Results from this “base” scenario as well as other situations are summarized in Table 1.


The base scenario reflects operational levels of infrastructure for a typical GN line. They observed a 34% increase in genetic gain at the crossbred level (measured in sp) at the expense of  -13% gain at the purebred level. When crossbred information is used and breeding objectives are set up correctly, so that they reflect improvement in the commercial product, then this sort of response would be expected. Further results included:


·        When considering higher rates of inbreeding, higher levels of crossbred gain are expected at the expense of lower levels of purebred gain;

·        Traits with low heritability (and small rpc) will benefit greatly from crossbred records;

·        Doubling the number of commercial dams producing crossbred progeny generated only a very slight improvement in crossbred gain (36% versus 34%), showing that the value of crossbred information diminishes as the number of observations increases;

·        It might be more advantageous to fill testing facilities with progeny from a larger number of sows to control population structure and inbreeding.


Table 1. Relative gain at the commercial level when using information from crossbred relatives


Genetic Gain (DG, %)







DF = 4%



DF = 1%



h2 = 0.1



rpc = 0.7



Commercial sows doubled



Commercial sows halved



Crossbred progeny halved



*Base situation: DF = 2%; h2 = 0.3; rpc = 0.4; 20 sires; 200 GN dams producing 16 purebred progeny/dam/year; 200 commercial sows producing 16 crossbred progeny/commercial sow/year.


Bijma and Van Arendonk (1998) observed similar results. Bijma et al., (2001) furthered their work by incorporating rate of inbreeding through the use of long-term genetic contributions[2] (Wooliams et al., 1999). Key to their conclusions were that under constant number of parents, changing from pureline to combined crossbred and purebred selection increased rate of inbreeding, especially in cases with high heritabilities and low rpc, where combined selection requires a larger number of parents to avoid excessive rates of inbreeding. At equal rates of inbreeding, combined selection was superior to pureline selection.


The studies mentioned above concentrated on two issues in breeding program design, namely genetic gain and inbreeding. A framework for the effective integration of crossbred information into the selection process will also account for allocation of sire usage, manipulation of trait distributions, marker genotypes (if known), and connectedness, among other issues. Such a structure exists in the use of Total Genetic Resource Management, or TGRM (Kinghorn, 2002). Of primary importance in the use of TGRM is the ability to evaluate the value of commercial information with the cost of collection and the optimal allocation of sires to the crossbred program.


Summary and Conclusions


Crossbred information has different uses in different livestock species of economic importance. In beef cattle crossbred information is important for multi-breed genetic evaluation to allow comparisons between animals of varying genetic makeup (grading-up systems). Alternatively, in pigs the inclusion of crossbred information into genetic evaluations allows for more consistent ranking of crossbred progeny from GN sires in different environments when commercial data is used in genetic evaluation. The use of crossbred records in genetic evaluation of pigs can have great impact on selection of candidates and overall response. Inherent to the use of the information is a well-defined breeding objective targeted to improvement of the commercial product rather than simply improvement of pureline performance only. The use of crossbred information will be dependent on the trait(s) chosen for improvement. In general, traits with low heritability and low purebred-crossbred correlation will enjoy the greatest benefit from the use of crossbred records.




Arnold, J. W., Bertrand, J. K. and L. L. Benyshek (1992). Animal model for genetic evaluation of multi-breed data. Journal of Animal Science 70, 3322-3332.


Besbes, B. and Gibson, J.P. (1999). Genetic variation of egg production traits in purebred and crossbred laying hens. Animal Science 68, 433-439.


Bijma, P. and Van Arendonk, J.A.M. (1998). Maximizing genetic gain for the sire line of a crossbreeding scheme utilizing both purebred and crossbred information. Animal Science 66, 529-542.


Bijma, P., Wooliams, J.A. and Van Arendonk, J.A.M. (2001). Genetic gain of pure line selection and combined crossbred selection with constrained inbreeding. Animal Science 72, 225-232.


Brandt, H. and Täubert, H. (1998). Parameter estimates for purebred and crossbred performances in pigs. Journal of Animal Breeding and Genetics 115, 97-104.


Hill, W. G. (1970). Theory of limits to selection with line crossing. In: ‘Mathematical Topics in Population Genetics’ (Ed Kojima.) pp. 210-245. (Springer Verlag: Berlin).


Kinghorn, B., Meszaros, S.A. and Vagg, R.D. (2002). Dynamic tactical decision systems for animal breeding. Seventh World Congress on Genetics Applied to Livestock Production, 23-07.


Klei, L., Quaas, R.L., Pollak, E. J. and Cunningham B. E. (1996). Multiple-breed Evaluation. Available at Accessed 10 November 2002.


Kremer, V.D., Knap, P.W., Van der Steen, H.A.M., Villanueva, B. and Wooliams, J.A. (2001). Optimizing breeding programs whilst simultaneously managing genetic gain and inbreeding rate. 52nd Annual Meeting of the European Association for Animal Production (EAAP), August 2001, Budapest, Hungary.


Lo, L.L., Fernando, R.L. and Cantet, R.J.C. (1995). Theory for modeling means and covariances in a two-breed population with dominance inheritance. Theoretical and Applied Genetics 90, 49-62.


Lo, L.L., Fernando, R.L. and Grossman, M. (1997). Genetic evaluation by BLUP in two-breed terminal crossbreeding systems under dominance. Journal of Animal Science 75, 2877-2884.


Lutaaya, E., Misztal, I., Mabry, J.W., Short, T., Timm, H.H., and Holzbauer, R. (2001). Genetic parameter estimates from joint evaluation of purebreds and crossbreds in swine using the crossbred model. Journal of Animal Science 79, 3002-3007.


Lutaaya, E., Misztal, I., Mabry, J.W., Short, T., Timm, H.H., and Holzbauer, R. (2002). Joint evaluation of purebreds and crossbreds in swine. Journal of Animal Science 80, 2263-2266.


Newman, S., Reverter, A. and Johnston, D.J. (2002). Purebred-crossbred performance and genetic evaluation of postweaning growth and carcass traits in Bos indicus x Bos taurus crosses in Australia.  Journal of Animal Science 80, 1801-1808.


Pirchner, F. and Mergl, R. (1977). Overdominance as a cause for heterosis in poultry. Journal of Animal Breeding and Genetics 94, 151-158.


Spilke, J., Groeneveld, E. and Mielenz, N. (1998). Joint purebred and crossbred (co)variance component estimation with a pseudo multiple-trait model: Loss in efficiency. Journal of Animal Breeding and Genetics 115, 341-350.


Wei, M. and Van der Steen, H. A. M. (1991). Comparison of reciprocal recurrent selection with pure-line selection systems in animal breeding (A review). Animal Breeding Abstracts 59, 281-298.


Wei, M., Van der Werf, J. H. J. and Brascamp, E. W. (1991). Relationship between purebred and crossbred parameters. II. Genetic correlation between purebred and crossbred performance under the model with two loci. Journal of Animal Breeding and Genetics 108, 262-269.


Wei, M. and Van der Werf, J. H. J. (1994). Maximizing genetic response in crossbreds using both purebred and crossbred information. Animal Production 58, 401-413.


Wei, M. and Van der Werf, J. H. J. (1995). Genetic correlation and heritabilities for purebred and crossbred performance in poultry egg production traits. Journal of Animal Science, 73:2220-2226.


Wolf, J., Peskovicova, D., Wolfova, M. and Groeneveld, E. (2002). Impact of genetic groups and crossbred information on the prediction of breeding values in pig sire breeds. Czech Journal of Animal Science 47, 219-229.


Wooliams. J.A., Bijma, P. and Villanueva, B. (1999). Expected genetic contributions and their impact on gene flow and genetic gain. Genetics 153, 1009-1020.


[1] Breed and line can be used interchangeably in the context of genetic evaluation. Usage of breed is more appropriate for beef cattle, while line is a useful term for pigs.


[2] The long-term contribution of an individual is its proportional contribution to the genetic makeup of the population in the long term.