Genomic Selection Basics
Genomic selection (GS) is a breeding approach that uses genome-wide DNA markers to predict the performance of individuals before field testing.
It enables faster, data-driven selection decisions in long-cycle crops such as tea, maize, and cassava.
π± Why It Mattersβ
Traditional selection relies on years of phenotypic observation.
GS, by contrast, uses statistical models trained on genotypeβphenotype datasets to estimate genomic estimated breeding values (GEBVs), reducing the need for full-cycle trials.
This allows:
- Earlier identification of promising genotypes
- Reduced breeding cycle times
- Greater genetic gain per unit time
𧬠Key Steps in a GS Workflowβ
- Genotyping β collect genome-wide marker data (e.g., SNP arrays, sequencing).
- Phenotyping β measure target traits in a training population.
- Model training β fit a statistical or machine-learning model linking markers to traits.
- Prediction β use the trained model to estimate GEBVs for untested individuals.
- Selection β choose top candidates for crossing or field validation.
βοΈ Common Statistical Modelsβ
| Model | Description | Software |
|---|---|---|
| GBLUP | Linear mixed model assuming equal marker effects | rrBLUP, sommer (R) |
| BayesB/BayesC | Variable selection models with marker-specific variances | BGLR |
| Random Forest | Non-parametric ML model capturing interactions | caret, ranger |
| DeepGS | Neural network-based approach | TensorFlow, Keras |
π Evaluating Accuracyβ
Prediction accuracy depends on:
- Marker density and quality
- Training population size
- Genetic relatedness between training and test sets
- Trait heritability
Use cross-validation or forward prediction to quantify performance before deployment.
π Further Readingβ
- Meuwissen, T., Hayes, B., & Goddard, M. (2001). Prediction of total genetic value using genome-wide dense marker maps. Genetics, 157(4), 1819β1829.
- Lubanga et al. (2023). Genomic selection strategies to increase genetic gain in tea breeding programs. The Plant Genome.