Package 'gpcp'

Title: Genomic Prediction of Cross Performance
Description: The gpcp package provides tools to perform genomic prediction of cross performance in plant breeding using marker and phenotypic data. It implements mixed‐model methods to estimate mean F1 performance across many potential crosses.
Authors: Marlee Labroo [aut], Christine Nyaga [cre, aut], Lukas Mueller [aut]
Maintainer: Christine Nyaga <[email protected]>
License: GPL (>= 3)
Version: 0.1.1
Built: 2026-05-16 09:08:01 UTC
Source: https://github.com/cmn92/gpcp

Help Index


Example Phenotype Data

Description

This is a sample phenotype dataset used for genomic prediction.

Usage

phenotypeFile

Format

A data frame with 24 columns:

ATW

Description of ATW

AUDPC_YAD

Area Under Disease Progress Curve for YAD

AUDPC_YMV

Area Under Disease Progress Curve for YMV

Accession

Genotype IDs for each individual

Block

Block information

DMC

Dry Matter Content values

Design

Experimental design

LOC

Location of the trials

NPH

Number of Plants Harvested

OXBI

Oxidation Index

Oxint180Minutes

Oxidation intensity after 180 minutes

PLOT

Plot number

REP

Replication number

Settweight

Weight of the planting setts

TTNPL

Total Tuber Number per Plant

TTWPL

Total Tuber Weight per Plant

Trial

Trial name or ID

Vigor

Plant vigor score

YIELD

Yield values

Year

Year of the experiment

Yield.per.plot..kg.

Yield per plot in kilograms

Yield_udj

Unadjusted Yield

rAUDPC_YAD

Relative AUDPC for YAD

rAUDPC_YMV

Relative AUDPC for YMV

Source

Generated for the gpcp package example

Examples

data(phenotypeFile)
head(phenotypeFile)

Genomic Prediction of Cross Performance

Description

This function performs genomic prediction of cross performance using genotype and phenotype data. It processes data in several steps including loading necessary software, converting genotype data, processing phenotype data, fitting mixed models, and predicting cross performance based on weighted marker effects.

Usage

runGPCP(phenotypeFile, genotypeFile, genotypes, traits, weights = NA, userSexes = "",
        userFixed = NA, userRandom = NA, Ploidy = NA, NCrosses = NA)

Arguments

phenotypeFile

A data frame containing phenotypic data, typically read from a CSV file.

genotypeFile

A file path to the genotypic data, either in VCF format or as a HapMap.

genotypes

A character string representing the column name in the phenotype file that corresponds to the genotype IDs.

traits

A string of comma-separated trait names from the phenotype file, which will be used for genomic prediction.

weights

A numeric vector specifying the weights for the traits. The order of weights should correspond to the order of traits.

userSexes

Optional. A string representing the column name in the phenotype file corresponding to the individuals' sexes.

userFixed

A string of comma-separated fixed effect variables from the phenotype file. If no fixed effects are required, set to NA.

userRandom

A string of comma-separated random effect variables from the phenotype file. If no random effects are required, set to NA.

Ploidy

An integer representing the ploidy level of the organism (e.g., 2, 4, 6).

NCrosses

An integer specifying the number of top crosses to output. Maximum is a full diallel.

Details

This function is designed for genomic prediction of cross performance and can handle both diploid and polyploid species. It processes genotype data, calculates genetic relationships, and fits mixed models using the 'sommer' package. It outputs the best predicted crosses based on user-defined traits and weights.

Value

A data frame containing predicted crosses with the following columns:

Parent1

First parent genotype ID.

Parent2

Second parent genotype ID.

CrossPredictedMerit

Predicted merit of the cross.

P1Sex

Optional. Sex of the first parent if userSexes is provided.

P2Sex

Optional. Sex of the second parent if userSexes is provided.

Note

This function relies on the 'sommer', 'dplyr', and 'AGHmatrix' packages for processing mixed models and genomic data.

Author(s)

Marlee Labroo, Christine Nyaga, Lukas Mueller

References

Xiang, J., et al. (2016). "Mixed Model Methods for Genomic Prediction." Nature Genetics. Batista, L., et al. (2021). "Genetic Prediction and Relationship Matrices." Theoretical and Applied Genetics.

See Also

sommer,dplyr,Gmatrix

Examples

# Load phenotype data from CSV
phenotypeFile <- read.csv("~/Documents/GCPC_input_files/2020_TDr_PHENO (1).csv")

# Genotype file path
genotypeFile <- "~/Documents/GCPC_input_files/genotypeFile.vcf"


# Define inputs
genotypes <- "Accession"
traits <- c("rAUDPC_YMV", "YIELD", "DMC")
weights <- c(0.2, 3, 1)
userFixed <- c("LOC", "REP")
Ploidy <- 2
NCrosses <- 150

# Run genomic prediction of cross performance
finalcrosses <- runGPCP(
    phenotypeFile = phenotypeFile,
    genotypeFile = genotypeFile,
    genotypes = genotypes,
    traits = paste(traits, collapse = ","),
    weights = weights,
    userFixed = paste(userFixed, collapse = ","),
    Ploidy = Ploidy,
    NCrosses = NCrosses
)

# View the predicted crosses
print(finalcrosses)

Genomic Prediction of Cross Performance This function performs genomic prediction of cross performance using genotype and phenotype data.

Description

Genomic Prediction of Cross Performance This function performs genomic prediction of cross performance using genotype and phenotype data.

Usage

runGPCP(
  phenotypeFile,
  genotypeFile = NA,
  genotypeData = NA,
  genotypes,
  traits,
  weights = NA,
  userSexes = "",
  userFixed = NA,
  userRandom = NA,
  Ploidy = NA,
  NCrosses = NA
)

Arguments

phenotypeFile

A data frame containing phenotypic data, typically read from a CSV file.

genotypeFile

Path to the genotypic data, either in VCF or HapMap format.

genotypeData

A dataframe containing genotypic data if genotypeFile not provided

genotypes

A character string representing the column name in the phenotype file for the genotype IDs.

traits

A string of comma-separated trait names from the phenotype file.

weights

A numeric vector specifying weights for the traits.

userSexes

A string representing the column name corresponding to the individuals' sexes.

userFixed

A string of comma-separated fixed effect variables.

userRandom

A string of comma-separated random effect variables.

Ploidy

An integer representing the ploidy level of the organism.

NCrosses

An integer specifying the number of top crosses to output.

Value

A data frame containing predicted cross performance.

Examples

# Load phenotype data from CSV
# Diploid pipeline
phenotypeFile <- read.csv(system.file("extdata", "phenotypeFile.csv", package = "gpcp"))
genotypeFile <- system.file("extdata", "genotypeFile_Chr9and11.vcf", package = "gpcp")
finalcrosses <- runGPCP(
    phenotypeFile = phenotypeFile,
    genotypeFile = genotypeFile,
    genotypes = "Accession",
    traits = "YIELD,DMC",
    weights = c(3, 1),
    userFixed = "LOC,REP",
    Ploidy = 2,
    NCrosses = 150
)
message(finalcrosses)
 #PolyPLoid Pipeline
 # 1) load example data from the package
data(DT_polyploid, package = "sommer")
DT <- DT_polyploid
GT <- GT_polyploid
MP <- MP_polyploid
# 2) convert A/T/C/G strings to numeric codes
numo <- sommer::atcg1234(data = GT, ploidy = 4)

# 3) find the set of individuals common to genotypes and phenotypes
common <- intersect(DT$Name, rownames(numo$M))
marks  <- numo$M[common, , drop = FALSE]
pheno2 <- as.data.frame(DT[match(common, DT$Name), ])
# 4) call runGPCP with ploidy = 4
result4x <- suppressWarnings(
  runGPCP(
    phenotypeFile = pheno2,
    genotypeData   = marks,
    genotypes      = "Name",
    traits         = c("total_yield", "tuber_length"),
    weights        = c(3, 1),
    Ploidy         = 4,
    NCrosses       = 100
  )
)