| Title: | Genomic Prediction of Cross Performance |
|---|---|
| Description: | The gpcp package provides tools to perform genomic prediction of cross performance in plant breeding using marker and phenotypic data. It implements mixedâmodel methods to estimate mean F1 performance across many potential crosses. |
| Authors: | Marlee Labroo [aut], Christine Nyaga [cre, aut], Lukas Mueller [aut] |
| Maintainer: | Christine Nyaga <[email protected]> |
| License: | GPL (>= 3) |
| Version: | 0.1.1 |
| Built: | 2026-05-16 09:08:01 UTC |
| Source: | https://github.com/cmn92/gpcp |
This is a sample phenotype dataset used for genomic prediction.
phenotypeFilephenotypeFile
A data frame with 24 columns:
Description of ATW
Area Under Disease Progress Curve for YAD
Area Under Disease Progress Curve for YMV
Genotype IDs for each individual
Block information
Dry Matter Content values
Experimental design
Location of the trials
Number of Plants Harvested
Oxidation Index
Oxidation intensity after 180 minutes
Plot number
Replication number
Weight of the planting setts
Total Tuber Number per Plant
Total Tuber Weight per Plant
Trial name or ID
Plant vigor score
Yield values
Year of the experiment
Yield per plot in kilograms
Unadjusted Yield
Relative AUDPC for YAD
Relative AUDPC for YMV
Generated for the gpcp package example
data(phenotypeFile) head(phenotypeFile)data(phenotypeFile) head(phenotypeFile)
This function performs genomic prediction of cross performance using genotype and phenotype data. It processes data in several steps including loading necessary software, converting genotype data, processing phenotype data, fitting mixed models, and predicting cross performance based on weighted marker effects.
runGPCP(phenotypeFile, genotypeFile, genotypes, traits, weights = NA, userSexes = "", userFixed = NA, userRandom = NA, Ploidy = NA, NCrosses = NA)runGPCP(phenotypeFile, genotypeFile, genotypes, traits, weights = NA, userSexes = "", userFixed = NA, userRandom = NA, Ploidy = NA, NCrosses = NA)
phenotypeFile |
A data frame containing phenotypic data, typically read from a CSV file. |
genotypeFile |
A file path to the genotypic data, either in VCF format or as a HapMap. |
genotypes |
A character string representing the column name in the phenotype file that corresponds to the genotype IDs. |
traits |
A string of comma-separated trait names from the phenotype file, which will be used for genomic prediction. |
weights |
A numeric vector specifying the weights for the traits. The order of weights should correspond to the order of traits. |
userSexes |
Optional. A string representing the column name in the phenotype file corresponding to the individuals' sexes. |
userFixed |
A string of comma-separated fixed effect variables from the phenotype file. If no fixed effects are required, set to NA. |
userRandom |
A string of comma-separated random effect variables from the phenotype file. If no random effects are required, set to NA. |
Ploidy |
An integer representing the ploidy level of the organism (e.g., 2, 4, 6). |
NCrosses |
An integer specifying the number of top crosses to output. Maximum is a full diallel. |
This function is designed for genomic prediction of cross performance and can handle both diploid and polyploid species. It processes genotype data, calculates genetic relationships, and fits mixed models using the 'sommer' package. It outputs the best predicted crosses based on user-defined traits and weights.
A data frame containing predicted crosses with the following columns:
Parent1 |
First parent genotype ID. |
Parent2 |
Second parent genotype ID. |
CrossPredictedMerit |
Predicted merit of the cross. |
P1Sex |
Optional. Sex of the first parent if userSexes is provided. |
P2Sex |
Optional. Sex of the second parent if userSexes is provided. |
This function relies on the 'sommer', 'dplyr', and 'AGHmatrix' packages for processing mixed models and genomic data.
Marlee Labroo, Christine Nyaga, Lukas Mueller
Xiang, J., et al. (2016). "Mixed Model Methods for Genomic Prediction." Nature Genetics. Batista, L., et al. (2021). "Genetic Prediction and Relationship Matrices." Theoretical and Applied Genetics.
# Load phenotype data from CSV phenotypeFile <- read.csv("~/Documents/GCPC_input_files/2020_TDr_PHENO (1).csv") # Genotype file path genotypeFile <- "~/Documents/GCPC_input_files/genotypeFile.vcf" # Define inputs genotypes <- "Accession" traits <- c("rAUDPC_YMV", "YIELD", "DMC") weights <- c(0.2, 3, 1) userFixed <- c("LOC", "REP") Ploidy <- 2 NCrosses <- 150 # Run genomic prediction of cross performance finalcrosses <- runGPCP( phenotypeFile = phenotypeFile, genotypeFile = genotypeFile, genotypes = genotypes, traits = paste(traits, collapse = ","), weights = weights, userFixed = paste(userFixed, collapse = ","), Ploidy = Ploidy, NCrosses = NCrosses ) # View the predicted crosses print(finalcrosses)# Load phenotype data from CSV phenotypeFile <- read.csv("~/Documents/GCPC_input_files/2020_TDr_PHENO (1).csv") # Genotype file path genotypeFile <- "~/Documents/GCPC_input_files/genotypeFile.vcf" # Define inputs genotypes <- "Accession" traits <- c("rAUDPC_YMV", "YIELD", "DMC") weights <- c(0.2, 3, 1) userFixed <- c("LOC", "REP") Ploidy <- 2 NCrosses <- 150 # Run genomic prediction of cross performance finalcrosses <- runGPCP( phenotypeFile = phenotypeFile, genotypeFile = genotypeFile, genotypes = genotypes, traits = paste(traits, collapse = ","), weights = weights, userFixed = paste(userFixed, collapse = ","), Ploidy = Ploidy, NCrosses = NCrosses ) # View the predicted crosses print(finalcrosses)
Genomic Prediction of Cross Performance This function performs genomic prediction of cross performance using genotype and phenotype data.
runGPCP( phenotypeFile, genotypeFile = NA, genotypeData = NA, genotypes, traits, weights = NA, userSexes = "", userFixed = NA, userRandom = NA, Ploidy = NA, NCrosses = NA )runGPCP( phenotypeFile, genotypeFile = NA, genotypeData = NA, genotypes, traits, weights = NA, userSexes = "", userFixed = NA, userRandom = NA, Ploidy = NA, NCrosses = NA )
phenotypeFile |
A data frame containing phenotypic data, typically read from a CSV file. |
genotypeFile |
Path to the genotypic data, either in VCF or HapMap format. |
genotypeData |
A dataframe containing genotypic data if genotypeFile not provided |
genotypes |
A character string representing the column name in the phenotype file for the genotype IDs. |
traits |
A string of comma-separated trait names from the phenotype file. |
weights |
A numeric vector specifying weights for the traits. |
userSexes |
A string representing the column name corresponding to the individuals' sexes. |
userFixed |
A string of comma-separated fixed effect variables. |
userRandom |
A string of comma-separated random effect variables. |
Ploidy |
An integer representing the ploidy level of the organism. |
NCrosses |
An integer specifying the number of top crosses to output. |
A data frame containing predicted cross performance.
# Load phenotype data from CSV # Diploid pipeline phenotypeFile <- read.csv(system.file("extdata", "phenotypeFile.csv", package = "gpcp")) genotypeFile <- system.file("extdata", "genotypeFile_Chr9and11.vcf", package = "gpcp") finalcrosses <- runGPCP( phenotypeFile = phenotypeFile, genotypeFile = genotypeFile, genotypes = "Accession", traits = "YIELD,DMC", weights = c(3, 1), userFixed = "LOC,REP", Ploidy = 2, NCrosses = 150 ) message(finalcrosses) #PolyPLoid Pipeline # 1) load example data from the package data(DT_polyploid, package = "sommer") DT <- DT_polyploid GT <- GT_polyploid MP <- MP_polyploid # 2) convert A/T/C/G strings to numeric codes numo <- sommer::atcg1234(data = GT, ploidy = 4) # 3) find the set of individuals common to genotypes and phenotypes common <- intersect(DT$Name, rownames(numo$M)) marks <- numo$M[common, , drop = FALSE] pheno2 <- as.data.frame(DT[match(common, DT$Name), ]) # 4) call runGPCP with ploidy = 4 result4x <- suppressWarnings( runGPCP( phenotypeFile = pheno2, genotypeData = marks, genotypes = "Name", traits = c("total_yield", "tuber_length"), weights = c(3, 1), Ploidy = 4, NCrosses = 100 ) )# Load phenotype data from CSV # Diploid pipeline phenotypeFile <- read.csv(system.file("extdata", "phenotypeFile.csv", package = "gpcp")) genotypeFile <- system.file("extdata", "genotypeFile_Chr9and11.vcf", package = "gpcp") finalcrosses <- runGPCP( phenotypeFile = phenotypeFile, genotypeFile = genotypeFile, genotypes = "Accession", traits = "YIELD,DMC", weights = c(3, 1), userFixed = "LOC,REP", Ploidy = 2, NCrosses = 150 ) message(finalcrosses) #PolyPLoid Pipeline # 1) load example data from the package data(DT_polyploid, package = "sommer") DT <- DT_polyploid GT <- GT_polyploid MP <- MP_polyploid # 2) convert A/T/C/G strings to numeric codes numo <- sommer::atcg1234(data = GT, ploidy = 4) # 3) find the set of individuals common to genotypes and phenotypes common <- intersect(DT$Name, rownames(numo$M)) marks <- numo$M[common, , drop = FALSE] pheno2 <- as.data.frame(DT[match(common, DT$Name), ]) # 4) call runGPCP with ploidy = 4 result4x <- suppressWarnings( runGPCP( phenotypeFile = pheno2, genotypeData = marks, genotypes = "Name", traits = c("total_yield", "tuber_length"), weights = c(3, 1), Ploidy = 4, NCrosses = 100 ) )