| Title: | Tests for Segregation Distortion in Polyploids |
|---|---|
| Description: | Provides tests for segregation distortion in F1 polyploid populations under different assumptions of meiosis. These tests can account for double reduction, partial preferential pairing, and genotype uncertainty through the use of genotype likelihoods. Parallelization support is provided. Details of these methods are described in Gerard et al. (2025a) <doi:10.1007/s00122-025-04816-z> and Gerard et al. (2025b) <doi:10.1093/g3journal/jkaf212>. Part of this material is based upon work supported by the National Science Foundation under Grant No. 2132247. The opinions, findings, and conclusions or recommendations expressed are those of the author and do not necessarily reflect the views of the National Science Foundation. |
| Authors: | David Gerard [aut, cre] (ORCID: <https://orcid.org/0000-0001-9450-5023>), Mira Thakkar [aut], Guilherme da Silva Pereira [ctb] (ORCID: <https://orcid.org/0000-0002-7106-8630>), NSF DBI 2132247 [fnd] (https://www.nsf.gov/awardsearch/showAward?AWD_ID=2132247) |
| Maintainer: | David Gerard <[email protected]> |
| License: | GPL (>= 3) |
| Version: | 2.0.2 |
| Built: | 2026-06-04 07:48:05 UTC |
| Source: | https://github.com/dcgerard/segtest |
The frequency of (nullplex, simplex, duplex) gametes is (.5 + beta, .5 - 2 * beta, beta). This function returns the upper bound on beta under two models.
beta_bounds(ploidy, model = c("ces", "prcs"))beta_bounds(ploidy, model = c("ces", "prcs"))
ploidy |
The ploidy |
model |
Either complete equational segregation ( |
Returns the upper bound on the probability of a gamete with a genotype of 2 when the parent has a genotype of 1. This is based on two models. The upper bound from complete equational separation is higher than the upper bound from the pure random chromatid segregation. See Huang et al (2019) for a description of these models.
The upper bound on beta.
David Gerard
Huang, K., Wang, T., Dunn, D. W., Zhang, P., Cao, X., Liu, R., & Li, B. (2019). Genotypic frequencies at equilibrium for polysomic inheritance under double-reduction. G3: Genes, Genomes, Genetics, 9(5), 1693-1706. doi:10.1534/g3.119.400132
beta_bounds(4) beta_bounds(6) beta_bounds(8) beta_bounds(10)beta_bounds(4) beta_bounds(6) beta_bounds(8) beta_bounds(10)
This chi-squared test is run under the assumption of no double reduction and no preferential pairing.
chisq_g(x, g1, g2) chisq_g4(x, g1, g2)chisq_g(x, g1, g2) chisq_g4(x, g1, g2)
x |
Vector of observed genotype counts |
g1 |
Parent 1's genotype |
g2 |
Parent 2's genotype |
A list containing the chi-squared statistic, degrees of freedom, and p-value.
chisq_g4(): Alias for chisq_g, for backwards compatibility.
Mira Thakkar and David Gerard
x <- c(1, 2, 4, 3, 0) g1 <- 2 g2 <- 2 chisq_g(x, g1, g2) x <- c(10, 25, 10, 0, 0) g1 <- 1 g2 <- 1 chisq_g(x, g1, g2)x <- c(1, 2, 4, 3, 0) g1 <- 2 g2 <- 2 chisq_g(x, g1, g2) x <- c(10, 25, 10, 0, 0) g1 <- 1 g2 <- 1 chisq_g(x, g1, g2)
Calculates the MLE genotype and runs a chi-squared test assuming no double reduction and no preferential pairing.
chisq_gl(gl, g1, g2) chisq_gl4(gl, g1, g2)chisq_gl(gl, g1, g2) chisq_gl4(gl, g1, g2)
gl |
A matrix of offspring genotype log-likelihoods. The rows index the
individuals and the columns index the possible genotypes. So
|
g1 |
The first parent's genotype. |
g2 |
The second parent's genotype. |
A list containing the chi-squared statistic, degrees of freedom, and p-value.
chisq_gl4(): Alias for chisq_gl, for backwards compatibility.
Mira Thakkar and David Gerard
## null sim set.seed(1) g1 <- 2 g2 <- 2 gl <- simf1gl(n = 25, g1 = g1, g2 = g2, alpha = 0, xi2 = 1/3) chisq_gl(gl = gl, g1 = g1, g2 = g2)## null sim set.seed(1) g1 <- 2 g2 <- 2 gl <- simf1gl(n = 25, g1 = g1, g2 = g2, alpha = 0, xi2 = 1/3) chisq_gl(gl = gl, g1 = g1, g2 = g2)
Provides the upper bounds on the double reduction rates based on the formulas in Huang et al. (2019). There are two upper bounds provided. The upper bound from complete equational separation is higher than the upper bound from the pure random chromatid segregation.
drbounds(ploidy, model = c("ces", "prcs"))drbounds(ploidy, model = c("ces", "prcs"))
ploidy |
The ploidy |
model |
Either complete equational segregation ( |
A vector of length floor(ploidy / 4) containing the upper bounds
on the rates of double reduction. The ithe element is the
upper bound on the probability that there are i pairs of
identical by double reduction alleles in a gamete.
David Gerard
Haldane, J. B. S. (1930). Theoretical genetics of autopolyploids. Journal of genetics, 22, 359-372. doi:10.1007/BF02984197
Huang, K., Wang, T., Dunn, D. W., Zhang, P., Cao, X., Liu, R., & Li, B. (2019). Genotypic frequencies at equilibrium for polysomic inheritance under double-reduction. G3: Genes, Genomes, Genetics, 9(5), 1693-1706. doi:10.1534/g3.119.400132
Mather, K. (1935). Reductional and equational separation of the chromosomes in bivalents and multivalents. Journal of genetics, 30, 53-78. doi:10.1007/BF02982205
drbounds(4) drbounds(6) drbounds(8) drbounds(10)drbounds(4) drbounds(6) drbounds(8) drbounds(10)
EM algorithm to estimate prior genotype probabilities from genotype likelihoods.
em_li(B, itermax = 100L, eps = 1e-05)em_li(B, itermax = 100L, eps = 1e-05)
B |
Matrix of genotype log-likelihoods. The rows index the individuals and the columns index the genotypes. |
itermax |
The maximum number of iterations. |
eps |
The stopping criteria. |
A vector of log prior probabilities for each genotype.
David Gerard
Li, H. (2011). A statistical framework for SNP calling, mutation discovery, association mapping and population genetical parameter estimation from sequencing data. Bioinformatics, 27(21), 2987-2993. doi:10.1093/bioinformatics/btr509
# Simulate some data set.seed(1) gl <- simgl(nvec = c(3, 2, 4, 1, 2)) # Run em lprob <- em_li(B = gl) # Exponentiate to get probabilities prob <- exp(c(lprob)) prob# Simulate some data set.seed(1) gl <- simgl(nvec = c(3, 2, 4, 1, 2)) # Run em lprob <- em_li(B = gl) # Exponentiate to get probabilities prob <- exp(c(lprob)) prob
Returns the gamete frequencies for autopolyploids, allopolyploids, and segmental allopolyploids, accounting for the effects of double reduction and partial preferential pairing.
gamfreq( g, ploidy, gamma = NULL, alpha = NULL, beta = NULL, type = c("mix", "polysomic"), add_dr = TRUE )gamfreq( g, ploidy, gamma = NULL, alpha = NULL, beta = NULL, type = c("mix", "polysomic"), add_dr = TRUE )
g |
Parent genotype. |
ploidy |
Parent ploidy. Should be even, and between 2 and 20 (inclusive). Let me know if you need the ploidy to be higher. I can update the package really easily. |
gamma |
The mixture proportions for the pairing configurations.
The proportions are in the same order the configurations in
|
alpha |
The double reduction rate(s) (if using). Defaults to 0's. |
beta |
The double reduction adjustment for simplex markers if
|
type |
Either |
add_dr |
A logical. If |
The vector of gamete frequencies. Element i is the
probability a gamete has genotype i - 1.
If type = "polysomic", then the gamete frequencies correspond
to those of Huang et al (2019). Those formulas are for general multiallelic
loci, so see also Appendix G of Gerard (2022) for special case of
biallelic loci. The relevant parameter is alpha, a vector of
length floor(ploidy / 4), where alpha[[i]] is the
probability that there are i pairs of double reduced alleles
in a gamete. The theoretical upper bound on alpha is given in
drbounds().
If type = "mix" and add_dr = FALSE, then the gamete
frequencies correspond to the pairing configuration model of
Gerard et al (2018). This model states that the gamete frequencies are
a convex combination of the disomic inheritance frequencies. The weights
of this convex combination are provided in the gamma parameter. The
total number of disomic segregation patterns is given by
n_pp_mix(). The order of these segregation patterns used is
the order in seg.
The model for type = "mix" and add_dr = TRUE is the same
as for type = "mix" and add_dr = FALSE except at
parental simplex loci. At such loci, there are no effects of preferential
pairing, and so the option add_dr = TRUE allows for the effects
of double reduction at simplex loci. The relevant parameter here is
beta. The first three gamete frequencies at simplex loci are
c(0.5 + beta, 0.5 - 2 * beta, beta), and the rest are 0. The
upper bound on beta for two different models are given by
beta_bounds().
David Gerard
Gerard, D. (2023). Double reduction estimation and equilibrium tests in natural autopolyploid populations. Biometrics, 79(3), 2143-2156. doi:10.1111/biom.13722
Gerard, D., Ferrão, L. F. V., Garcia, A. A. F., & Stephens, M. (2018). Genotyping polyploids from messy sequencing data. Genetics, 210(3), 789-807. doi:10.1534/genetics.118.301468
Huang, K., Wang, T., Dunn, D. W., Zhang, P., Cao, X., Liu, R., & Li, B. (2019). Genotypic frequencies at equilibrium for polysomic inheritance under double-reduction. G3: Genes, Genomes, Genetics, 9(5), 1693-1706. doi:10.1534/g3.119.400132
## Various duplex models gamfreq(g = 2, ploidy = 4, gamma = c(0, 1), type = "mix") gamfreq(g = 2, ploidy = 4, gamma = c(1, 0), type = "mix") gamfreq(g = 2, ploidy = 4, gamma = c(0.5, 0.5), type = "mix") gamfreq(g = 2, ploidy = 4, alpha = 0, type = "polysomic") gamfreq(g = 2, ploidy = 4, alpha = 1/6, type = "polysomic") ## Various simplex models gamfreq(g = 1, ploidy = 4, beta = 1/24, gamma = 1, type = "mix", add_dr = TRUE) gamfreq(g = 1, ploidy = 4, alpha = 1/6, type = "polysomic") gamfreq(g = 1, ploidy = 4, gamma = 1, type = "mix", add_dr = FALSE) gamfreq(g = 1, ploidy = 4, alpha = 0, type = "polysomic")## Various duplex models gamfreq(g = 2, ploidy = 4, gamma = c(0, 1), type = "mix") gamfreq(g = 2, ploidy = 4, gamma = c(1, 0), type = "mix") gamfreq(g = 2, ploidy = 4, gamma = c(0.5, 0.5), type = "mix") gamfreq(g = 2, ploidy = 4, alpha = 0, type = "polysomic") gamfreq(g = 2, ploidy = 4, alpha = 1/6, type = "polysomic") ## Various simplex models gamfreq(g = 1, ploidy = 4, beta = 1/24, gamma = 1, type = "mix", add_dr = TRUE) gamfreq(g = 1, ploidy = 4, alpha = 1/6, type = "polysomic") gamfreq(g = 1, ploidy = 4, gamma = 1, type = "mix", add_dr = FALSE) gamfreq(g = 1, ploidy = 4, alpha = 0, type = "polysomic")
Converts genotype counts to genotype vectors.
gcount_to_gvec(gcount)gcount_to_gvec(gcount)
gcount |
The vector of genotype counts. |
A vector of length sum(gcount), containing gcount[1]
copies of 0, gcount[2] copies of 1, gcount[3]
copies of 2, etc.
David Gerard
gcount <- c(1, 2, 3, 0, 5) gcount_to_gvec(gcount = gcount)gcount <- c(1, 2, 3, 0, 5) gcount_to_gvec(gcount = gcount)
Genotype frequencies of an F1 population under a generalized model.
gf_freq( p1_g, p1_ploidy, p1_gamma = NULL, p1_alpha = NULL, p1_beta = NULL, p1_type = c("mix", "polysomic"), p1_add_dr = TRUE, p2_g, p2_ploidy, p2_gamma = NULL, p2_alpha = NULL, p2_beta = NULL, p2_type = c("mix", "polysomic"), p2_add_dr = TRUE, pi = 0, nudge = sqrt(.Machine$double.eps) )gf_freq( p1_g, p1_ploidy, p1_gamma = NULL, p1_alpha = NULL, p1_beta = NULL, p1_type = c("mix", "polysomic"), p1_add_dr = TRUE, p2_g, p2_ploidy, p2_gamma = NULL, p2_alpha = NULL, p2_beta = NULL, p2_type = c("mix", "polysomic"), p2_add_dr = TRUE, pi = 0, nudge = sqrt(.Machine$double.eps) )
p1_g, p1_ploidy, p1_gamma, p1_alpha, p1_beta, p1_type, p1_add_dr
|
The first parent's version of the parameters in |
p2_g, p2_ploidy, p2_gamma, p2_alpha, p2_beta, p2_type, p2_add_dr
|
The second parent's version of the parameters in |
pi |
The proportion of outliers. |
nudge |
Zeros go to nudge |
A vector of genotype frequencies. Element i is the probability
and offspring has genotype i - 1.
David Gerard
gamfreq().
q <- gf_freq( p1_g = 2, p1_ploidy = 4, p1_gamma = c(0.1, 0.9), p1_type = "mix", p2_g = 2, p2_ploidy = 6, p2_alpha = 0.1, p2_type = "polysomic", pi = 0.05)q <- gf_freq( p1_g = 2, p1_ploidy = 4, p1_gamma = c(0.1, 0.9), p1_type = "mix", p2_g = 2, p2_ploidy = 6, p2_alpha = 0.1, p2_type = "polysomic", pi = 0.05)
gcount_to_gvec().Inverse function of gcount_to_gvec().
gvec_to_gcount(gvec, ploidy = 4)gvec_to_gcount(gvec, ploidy = 4)
gvec |
The vector of genotypes. |
ploidy |
The ploidy of the species. |
A vector of counts. Element k is the number of individuals
with genotype k-1.
David Gerard
gvec <- c(1, 2, 3, 2, 3, 1, 4, 0, 1, 0, 0, 1, 0, 0) gvec_to_gcount(gvec = gvec)gvec <- c(1, 2, 3, 2, 3, 1, 4, 0, 1, 0, 0, 1, 0, 0) gvec_to_gcount(gvec = gvec)
There is a dependence on the bounds of two-parameter model. This function returns TRUE if those bounds are satisfied and FALSE otherwise.
is_valid_2(dr, pp, drbound = 1/6)is_valid_2(dr, pp, drbound = 1/6)
dr |
The double reduction rate. |
pp |
The preferential pairing parameter. |
drbound |
The maximum double reduction rate possible. |
TRUE if the model is valid, FALSE otherwise.
David Gerard
TOL <- 1e-6 is_valid_2(dr = 1/6, pp = 1/3, drbound = 1/6) # Valid is_valid_2(dr = 1/6, pp = 1/3 - TOL, drbound = 1/6) # Not valid is_valid_2(dr = 1/6, pp = 1/3 + TOL, drbound = 1/6) # Not validTOL <- 1e-6 is_valid_2(dr = 1/6, pp = 1/3, drbound = 1/6) # Valid is_valid_2(dr = 1/6, pp = 1/3 - TOL, drbound = 1/6) # Not valid is_valid_2(dr = 1/6, pp = 1/3 + TOL, drbound = 1/6) # Not valid
Iterator over array
## S3 method for class 'array' iter(obj, by = 1, recycle = FALSE, ...)## S3 method for class 'array' iter(obj, by = 1, recycle = FALSE, ...)
obj |
An array. |
by |
The dimension to iterate over. |
recycle |
Should the iterator reset? |
... |
not used |
An iterator. This is an S3 arrayiter object, used in conjunction with nextElem to iterate over one index of an array.
David Gerard
glist <- multidog_to_g( mout = ufit, ploidy = 4, type = "all_gl", p1 = "indigocrisp", p2 = "sweetcrisp") g <- iterators::iter(glist$g, by = 3) head(iterators::nextElem(g)) head(iterators::nextElem(g)) head(iterators::nextElem(g))glist <- multidog_to_g( mout = ufit, ploidy = 4, type = "all_gl", p1 = "indigocrisp", p2 = "sweetcrisp") g <- iterators::iter(glist$g, by = 3) head(iterators::nextElem(g)) head(iterators::nextElem(g)) head(iterators::nextElem(g))
This is under the two parameter model.
like_gknown_2(x, alpha, xi1, xi2, g1, g2, log_p = TRUE, pen = 0)like_gknown_2(x, alpha, xi1, xi2, g1, g2, log_p = TRUE, pen = 0)
x |
A vector of length 5. |
alpha |
The double reduction rate. |
xi1 |
The preferential pairing parameter of parent 1. |
xi2 |
The preferential pairing parameter of parent 2. |
g1 |
Parent 1's genotype. |
g2 |
Parent 2's genotype. |
log_p |
A logical. Should we return the log likelihood or not? |
pen |
A tiny penalty to help with numerical stability |
The (log) likelihood.
David Gerard
x <- c(1, 4, 5, 3, 1) alpha <- 0.01 xi1 <- 0.5 xi2 <- 0.3 g1 <- 1 g2 <- 2 like_gknown_2( x = x, alpha = alpha, xi1 = xi1, xi2 = xi2, g1 = g1, g2 = g2)x <- c(1, 4, 5, 3, 1) alpha <- 0.01 xi1 <- 0.5 xi2 <- 0.3 g1 <- 1 g2 <- 2 like_gknown_2( x = x, alpha = alpha, xi1 = xi1, xi2 = xi2, g1 = g1, g2 = g2)
This is under the three parameter model.
like_gknown_3(x, tau, beta, gamma1, gamma2, g1, g2, log_p = TRUE, pen = 0)like_gknown_3(x, tau, beta, gamma1, gamma2, g1, g2, log_p = TRUE, pen = 0)
x |
A vector of length 5. |
tau |
The probability of quadrivalent formation. |
beta |
The probability of double reduction given quadrivalent formation. |
gamma1 |
The probability of AA_aa pairing for parent 1. |
gamma2 |
The probability of AA_aa pairing for parent 2. |
g1 |
Parent 1's genotype. |
g2 |
Parent 2's genotype. |
log_p |
A logical. Should we return the log likelihood or not? |
pen |
A tiny penalty to help with numerical stability |
The (log) likelihood.
David Gerard
x <- c(1, 4, 5, 3, 1) tau <- 0.5 beta <- 0.1 gamma1 <- 0.5 gamma2 <- 0.3 g1 <- 1 g2 <- 2 like_gknown_3( x = x, tau = tau, beta = beta, gamma1 = gamma1, gamma2 = gamma2, g1 = g1, g2 = g2)x <- c(1, 4, 5, 3, 1) tau <- 0.5 beta <- 0.1 gamma1 <- 0.5 gamma2 <- 0.3 g1 <- 1 g2 <- 2 like_gknown_3( x = x, tau = tau, beta = beta, gamma1 = gamma1, gamma2 = gamma2, g1 = g1, g2 = g2)
This is under the two parameter model.
like_glpknown_2(gl, alpha, xi1, xi2, g1, g2, log_p = TRUE)like_glpknown_2(gl, alpha, xi1, xi2, g1, g2, log_p = TRUE)
gl |
The matrix of genotype likelihoods of the offspring. Rows index The individuals, columns index the genotypes. |
alpha |
The double reduction rate. |
xi1 |
The preferential pairing parameter of parent 1. |
xi2 |
The preferential pairing parameter of parent 2. |
g1 |
Parent 1's genotype. |
g2 |
Parent 2's genotype. |
log_p |
A logical. Should we return the log likelihood or not? |
The (log) likelihood of the two parameter model when using genotype likelihoods.
David Gerard
g1 <- 1 g2 <- 0 gl <- simf1gl( n = 25, g1 = g1, g2 = g2, rd = 10, alpha = 0, xi1 = 1/3, xi2 = 1/3) like_glpknown_2( gl = gl, alpha = 0.01, xi1 = 0.5, xi2 = 0.3, g1 = g1, g2 = g2, log_p = TRUE)g1 <- 1 g2 <- 0 gl <- simf1gl( n = 25, g1 = g1, g2 = g2, rd = 10, alpha = 0, xi1 = 1/3, xi2 = 1/3) like_glpknown_2( gl = gl, alpha = 0.01, xi1 = 0.5, xi2 = 0.3, g1 = g1, g2 = g2, log_p = TRUE)
This is under the three parameter model.
like_glpknown_3(gl, tau, beta, gamma1, gamma2, g1, g2, log_p = TRUE)like_glpknown_3(gl, tau, beta, gamma1, gamma2, g1, g2, log_p = TRUE)
gl |
The matrix of genotype likelihoods of the offspring. Rows index The individuals, columns index the genotypes. |
tau |
The probability of quadrivalent formation. |
beta |
The probability of double reduction given quadrivalent formation. |
gamma1 |
The probability of AA_aa pairing for parent 1. |
gamma2 |
The probability of AA_aa pairing for parent 2. |
g1 |
Parent 1's genotype. |
g2 |
Parent 2's genotype. |
log_p |
A logical. Should we return the log likelihood or not? |
The (log) likelihood of the three parameter model when using genotype likelihoods.
David Gerard
g1 <- 1 g2 <- 0 gl <- simf1gl( n = 25, g1 = g1, g2 = g2, rd = 10, alpha = 0, xi1 = 1/3, xi2 = 1/3) like_glpknown_3( gl = gl, tau = 1/2, beta = 1/12, gamma1 = 1/3, gamma2 = 1/3, g1 = g1, g2 = g2, log_p = TRUE)g1 <- 1 g2 <- 0 gl <- simf1gl( n = 25, g1 = g1, g2 = g2, rd = 10, alpha = 0, xi1 = 1/3, xi2 = 1/3) like_glpknown_3( gl = gl, tau = 1/2, beta = 1/12, gamma1 = 1/3, gamma2 = 1/3, g1 = g1, g2 = g2, log_p = TRUE)
em_li()
Objective function for em_li()
llike_li(B, lpivec)llike_li(B, lpivec)
B |
The log-likelihood matrix. Rows are individuals columns are genotypes. |
lpivec |
The log prior vector. |
The log-likelihood of a vector of genotype frequencies when using genotype likelihoods. This is from Li (2011).
David Gerard
Li, H. (2011). A statistical framework for SNP calling, mutation discovery, association mapping and population genetical parameter estimation from sequencing data. Bioinformatics, 27(21), 2987-2993. doi:10.1093/bioinformatics/btr509
# Simulate some data set.seed(1) gl <- simgl(nvec = c(3, 2, 4, 1, 2)) # Log-likelihood at given log-priors prob <- c(0.1, 0.2, 0.4, 0.2, 0.1) lprob <- log(prob) llike_li(B = gl, lpivec = lprob)# Simulate some data set.seed(1) gl <- simgl(nvec = c(3, 2, 4, 1, 2)) # Log-likelihood at given log-priors prob <- c(0.1, 0.2, 0.4, 0.2, 0.1) lprob <- log(prob) llike_li(B = gl, lpivec = lprob)
This will run a likelihood ratio test using the genotypes of an F1 population of tetraploids for the null of Mendelian segregation (accounting for double reduction and preferential pairing) against the alternative of segregation distortion. This is when the genotypes are assumed known.
lrt_men_g4( x, g1, g2, drbound = 1/6, pp = TRUE, dr = TRUE, alpha = 0, xi1 = 1/3, xi2 = 1/3 )lrt_men_g4( x, g1, g2, drbound = 1/6, pp = TRUE, dr = TRUE, alpha = 0, xi1 = 1/3, xi2 = 1/3 )
x |
A vector of genotype counts. |
g1 |
The genotype of parent 1. |
g2 |
The genotype of parent 2. |
drbound |
The maximum rate of double reduction. A default of 1/6 is provided, which is the rate under the complete equational segregation model of meiosis. |
pp |
A logical. Should we account for preferential pairing
( |
dr |
A logical. Should we account for double reduction
( |
alpha |
If |
xi1 |
If |
xi2 |
If |
A list with the following elements
statisticThe log-likelihood ratio test statistic.
dfThe degrees of freedom.
p_valueThe p-value.
alphaThe estimated double reduction rate.
xi1The estimated preferential pairing parameter of parent 1.
xi2The estimated preferential pairing parameter of parent 2.
Some offspring genotype combinations are impossible given the parental genotypes. If these impossible genotypes combinations show up, we return a p-value of 0, a log-likelihood ratio statistic of Infinity, and missing values for all other return items. The impossible genotypes are:
g1 = 0 && g2 = 0Only offspring genotypes of 0 are possible.
g1 = 4 && g2 = 4Only offspring genotypes of 4 are possible.
g1 = 0 && g2 = 4 || g1 == 4 && g2 == 0Only offspring genotypes of 2 are possible.
g1 = 0 && g2 %in% c(1, 2, 3) || g1 = %in% c(1, 2, 3) && g2 == 0Only offspring genotypes of 0, 1, and 2 are possible.
g1 = 4 && g2 %in% c(1, 2, 3) || g1 = %in% c(1, 2, 3) && g2 == 4Only offspring genotypes of 2, 3, and 4 are possible.
When g1 = 2 or g2 = 2 (or both), the model is not identified
and those estimates (alpha, xi1, and xi2) are
meaningless. Do NOT interpret them.
The estimate of alpha (double reduction rate) IS identified as
long as at least one parent is simplex, and no parent is duplex.
However, the estimates of the double reduction rate have extremely high
variance.
David Gerard
set.seed(100) gf <- offspring_gf_2(alpha = 1/12, xi1 = 0.2, xi2 = 0.6, p1 = 1, p2 = 0) x <- offspring_geno(gf = gf, n = 100) lrt_men_g4(x = x, g1 = 1, g2 = 0)set.seed(100) gf <- offspring_gf_2(alpha = 1/12, xi1 = 0.2, xi2 = 0.6, p1 = 1, p2 = 0) x <- offspring_geno(gf = gf, n = 100) lrt_men_g4(x = x, g1 = 1, g2 = 0)
This will run a likelihood ratio test using the genotypes of an F1 population of tetraploids for the null of Mendelian segregation (accounting for double reduction and preferential pairing) against the alternative of segregation distortion. This is when genotype uncertainty is accounted for through genotype likelihoods.
lrt_men_gl4( gl, g1 = NULL, g2 = NULL, drbound = 1/6, pp = TRUE, dr = TRUE, alpha = 0, xi1 = 1/3, xi2 = 1/3 )lrt_men_gl4( gl, g1 = NULL, g2 = NULL, drbound = 1/6, pp = TRUE, dr = TRUE, alpha = 0, xi1 = 1/3, xi2 = 1/3 )
gl |
The genotype log-likelihoods. The rows index the individuals and the columns index the genotypes. |
g1 |
Either parent 1's genotype, or parent 1's genotype log-likelihoods. |
g2 |
Either parent 2's genotype, or parent 2's genotype log-likelihoods. |
drbound |
The upper bound on the double reduction rate. |
pp |
Is (partial) preferential pairing possible ( |
dr |
Is double reduction possible ( |
alpha |
If |
xi1 |
If |
xi2 |
If |
A list with the following elements
statisticThe log-likelihood ratio test statistic.
dfThe degrees of freedom.
p_valueThe p-value.
alphaThe estimated double reduction rate.
xi1The estimated preferential pairing parameter of parent 1.
xi2The estimated preferential pairing parameter of parent 2.
When g1 = 2 or g2 = 2 (or both), the model is not identified
and those estimates (alpha, xi1, and xi2) are
meaningless. Do NOT interpret them.
The estimate of alpha (double reduction rate) IS identified as
long as at least one parent is simplex, and no parent is duplex.
However, the estimates of the double reduction rate have extremely high
variance.
David Gerard
## null simulation set.seed(1) g1 <- 2 g2 <- 2 gl <- simf1gl(n = 25, g1 = g1, g2 = g2, alpha = 1/12, xi2 = 1/2) ## LRT when parent genotypes are known. lrt_men_gl4(gl = gl, g1 = g1, g2 = g2) ## LRT when parent genotypes are not known lrt_men_gl4(gl = gl) ## Alternative simulation gl <- simgl(nvec = rep(5, 5)) lrt_men_gl4(gl = gl, g1 = g1, g2 = g2)## null simulation set.seed(1) g1 <- 2 g2 <- 2 gl <- simf1gl(n = 25, g1 = g1, g2 = g2, alpha = 1/12, xi2 = 1/2) ## LRT when parent genotypes are known. lrt_men_gl4(gl = gl, g1 = g1, g2 = g2) ## LRT when parent genotypes are not known lrt_men_gl4(gl = gl) ## Alternative simulation gl <- simgl(nvec = rep(5, 5)) lrt_men_gl4(gl = gl, g1 = g1, g2 = g2)
Uses the future package to implement parallelization support for
the likelihood ratio tests for segregation distortion. This function
only works for tetraploids, and cannot account for outliers. For
higher ploidies and more functionality, see seg_multi().
multi_lrt( g, p1, p2, drbound = 1/6, pp = TRUE, dr = TRUE, alpha = 0, xi1 = 1/3, xi2 = 1/3, nullprop = FALSE )multi_lrt( g, p1, p2, drbound = 1/6, pp = TRUE, dr = TRUE, alpha = 0, xi1 = 1/3, xi2 = 1/3, nullprop = FALSE )
g |
One of two inputs
|
p1 |
One of three inputs
|
p2 |
One of three inputs
|
drbound |
The upper bound on the double reduction rate. |
pp |
Is (partial) preferential pairing possible ( |
dr |
Is double reduction possible ( |
alpha |
If |
xi1 |
If |
xi2 |
If |
nullprop |
Should we return the null proportions ( |
A data frame with the following elements:
statisticThe likelihood ratio test statistic
p_valueThe p-value of the likelihood ratio test.
dfThe degrees of freedom of the test.
alphaThe MLE of the double reduction rate. Do not use for real work.
xi1The MLE of the first parent's partial preferential pairing parameter. Do not use for real work.
xi2The MLE of the second parent's partial preferential pairing parameter. Do not use for real work.
p1(Estimate of) the first parent's genotype.
p2(Estimate of) the second parent's genotype.
snpThe name of the SNP.
The multi_lrt() function supports parallel computing. It does
so through the future
package.
You first specify the evaluation plan with plan()
from the future package. On a local machine, this is typically
just future::plan(future::multisession, workers = nc) where
nc is the number of workers you want. You can find the maximum
number of possible workers with availableCores().
You then run multi_lrt(), then shut down the workers with
future::plan(future::sequential).
David Gerard
lrt_men_g4() Single locus LRT for segregation distortion when genotypes are known.
lrt_men_gl4() Single locus LRT for segregation distortion when using genotype likelihoods.
## Assuming genotypes are known (typically a bad idea) glist <- multidog_to_g( mout = ufit, ploidy = 4, type = "all_g", p1 = "indigocrisp", p2 = "sweetcrisp") p1_1 <- glist$p1 p2_1 <- glist$p2 g_1 <- glist$g multi_lrt(g = g_1, p1 = p1_1, p2 = p2_1) ## Using genotype likelihoods (typically a good idea) glist <- multidog_to_g( mout = ufit, ploidy = 4, type = "all_gl", p1 = "indigocrisp", p2 = "sweetcrisp") p1_2 <- glist$p1 p2_2 <- glist$p2 g_2 <- glist$g multi_lrt(g = g_2, p1 = p1_2, p2 = p2_2) ## Offspring genotype likelihoods and parent genotypes known multi_lrt(g = g_2, p1 = p1_1, p2 = p2_1) ## Offspring genotype likelihoods and no information on parent genotypes multi_lrt(g = g_2, p1 = NULL, p2 = NULL) ## Parallel computing is supported through the future package # future::plan(future::multisession, workers = 2) # multi_lrt(g = g_2, p1 = p1_2, p2 = p2_2) # future::plan(future::sequential)## Assuming genotypes are known (typically a bad idea) glist <- multidog_to_g( mout = ufit, ploidy = 4, type = "all_g", p1 = "indigocrisp", p2 = "sweetcrisp") p1_1 <- glist$p1 p2_1 <- glist$p2 g_1 <- glist$g multi_lrt(g = g_1, p1 = p1_1, p2 = p2_1) ## Using genotype likelihoods (typically a good idea) glist <- multidog_to_g( mout = ufit, ploidy = 4, type = "all_gl", p1 = "indigocrisp", p2 = "sweetcrisp") p1_2 <- glist$p1 p2_2 <- glist$p2 g_2 <- glist$g multi_lrt(g = g_2, p1 = p1_2, p2 = p2_2) ## Offspring genotype likelihoods and parent genotypes known multi_lrt(g = g_2, p1 = p1_1, p2 = p2_1) ## Offspring genotype likelihoods and no information on parent genotypes multi_lrt(g = g_2, p1 = NULL, p2 = NULL) ## Parallel computing is supported through the future package # future::plan(future::multisession, workers = 2) # multi_lrt(g = g_2, p1 = p1_2, p2 = p2_2) # future::plan(future::sequential)
Converts multidog output to a format usable for seg_multi() and multi_lrt()
multidog_to_g( mout, ploidy, type = c("off_gl", "all_gl", "off_g", "all_g"), p1 = NULL, p2 = NULL )multidog_to_g( mout, ploidy, type = c("off_gl", "all_gl", "off_g", "all_g"), p1 = NULL, p2 = NULL )
mout |
The output of |
ploidy |
The ploidy. |
type |
|
p1 |
The first (or only) parent name if using |
p2 |
The second parent name if using |
A list with the following elements
gEither a matrix of counts, where the columns index the genotype
and the rows index the loci (type = "all_g" or
type = "off_g"). Or an array of genotype (natural) log-likelihoods
where the rows index the loci, the columns index the
individuals, and the slices index the genotypes
(type = "all_gl" or type = "off_gl").
p1Either a vector of known parental genotypes
(type = "off_gl", type = "all_g" or type = "off_g").
Or a matrix of genotype (natural) log-likelihoods where the
rows index the loci and the columns index the genotypes
(type = "all_gl").
p2Either a vector of known parental genotypes
(type = "off_gl", type = "all_g" or type = "off_g").
Or a matrix of genotype (natural) log-likelihoods where the
rows index the loci and the columns index the genotypes
(type = "all_gl").
This will be NULL if you (i) used "s1" or
"s1pp" models in updog and used either
type = "off_g" or type = "off_gl" or
(ii) used type = "all_g" or type = "all_gl"
and only specified p1 but not p2.
David Gerard
multidog_to_g( mout = ufit, ploidy = 4, type = "all_g", p1 = "indigocrisp", p2 = "sweetcrisp") multidog_to_g( mout = ufit, ploidy = 4, type = "all_gl", p1 = "indigocrisp", p2 = "sweetcrisp") multidog_to_g(mout = ufit2, ploidy = 4, type = "off_g") multidog_to_g(mout = ufit2, ploidy = 4, type = "off_gl") multidog_to_g(mout = ufit3, ploidy = 4, type = "off_g") multidog_to_g(mout = ufit3, ploidy = 4, type = "off_gl")multidog_to_g( mout = ufit, ploidy = 4, type = "all_g", p1 = "indigocrisp", p2 = "sweetcrisp") multidog_to_g( mout = ufit, ploidy = 4, type = "all_gl", p1 = "indigocrisp", p2 = "sweetcrisp") multidog_to_g(mout = ufit2, ploidy = 4, type = "off_g") multidog_to_g(mout = ufit2, ploidy = 4, type = "off_gl") multidog_to_g(mout = ufit3, ploidy = 4, type = "off_g") multidog_to_g(mout = ufit3, ploidy = 4, type = "off_gl")
The number of disomic inheritance patterns for a given ploidy and a given
parental dosage. See also seg for the list of all possible
disomic inheritance patterns for even ploidies up to 20.
n_pp_mix(g, ploidy)n_pp_mix(g, ploidy)
g |
parent genotype |
ploidy |
parent ploidy |
The number of mixture components.
n_pp_mix(g = 0, ploidy = 4) n_pp_mix(g = 1, ploidy = 4) n_pp_mix(g = 2, ploidy = 4) n_pp_mix(g = 3, ploidy = 4) n_pp_mix(g = 4, ploidy = 4) n_pp_mix(g = 0, ploidy = 6) n_pp_mix(g = 1, ploidy = 6) n_pp_mix(g = 2, ploidy = 6) n_pp_mix(g = 3, ploidy = 6) n_pp_mix(g = 4, ploidy = 6) n_pp_mix(g = 5, ploidy = 6) n_pp_mix(g = 6, ploidy = 6)n_pp_mix(g = 0, ploidy = 4) n_pp_mix(g = 1, ploidy = 4) n_pp_mix(g = 2, ploidy = 4) n_pp_mix(g = 3, ploidy = 4) n_pp_mix(g = 4, ploidy = 4) n_pp_mix(g = 0, ploidy = 6) n_pp_mix(g = 1, ploidy = 6) n_pp_mix(g = 2, ploidy = 6) n_pp_mix(g = 3, ploidy = 6) n_pp_mix(g = 4, ploidy = 6) n_pp_mix(g = 5, ploidy = 6) n_pp_mix(g = 6, ploidy = 6)
This is applied to an arrayiter object to obtain the next sub-array
along one of the dimensions.
## S3 method for class 'arrayiter' nextElem(obj, ...)## S3 method for class 'arrayiter' nextElem(obj, ...)
obj |
An arrayiter object |
... |
not used |
The next sub-array.
David Gerard
glist <- multidog_to_g( mout = ufit, ploidy = 4, type = "all_gl", p1 = "indigocrisp", p2 = "sweetcrisp") g <- iterators::iter(glist$g, by = 3) head(iterators::nextElem(g)) head(iterators::nextElem(g)) head(iterators::nextElem(g))glist <- multidog_to_g( mout = ufit, ploidy = 4, type = "all_gl", p1 = "indigocrisp", p2 = "sweetcrisp") g <- iterators::iter(glist$g, by = 3) head(iterators::nextElem(g)) head(iterators::nextElem(g)) head(iterators::nextElem(g))
Takes as input the offspring genotype frequencies and a sample size and returns simulated genotypes.
offspring_geno(gf, n)offspring_geno(gf, n)
gf |
Vector of offspring genotype frequencies |
n |
Sample size |
Simulated genotypes
Mira Thakkar
set.seed(1) gf <- offspring_gf_2(alpha = 1/6, xi1 = 1/3, xi2 = 1/3, p1 = 2, p2 = 3) offspring_geno(gf = gf, n = 10)set.seed(1) gf <- offspring_gf_2(alpha = 1/6, xi1 = 1/3, xi2 = 1/3, p1 = 2, p2 = 3) offspring_geno(gf = gf, n = 10)
Calculates offspring genotype frequencies under the two-parameter model.
offspring_gf_2(alpha, xi1, xi2 = xi1, p1, p2)offspring_gf_2(alpha, xi1, xi2 = xi1, p1, p2)
alpha |
The double reduction rate |
xi1 |
The preferential pairing parameter of the first parent. |
xi2 |
The preferential pairing parameter of the second parent. |
p1 |
The first parent's genotype |
p2 |
The second parent's genotype |
Offspring genotype frequencies
Mira Thakkar
alpha <- 1/6 xi1 <- 1/3 xi2 <- 1/3 p1 <- 2 p2 <- 3 offspring_gf_2(alpha = alpha, xi1 = xi1, xi2 = xi2, p1 = p1, p2 = p2)alpha <- 1/6 xi1 <- 1/3 xi2 <- 1/3 p1 <- 2 p2 <- 3 offspring_gf_2(alpha = alpha, xi1 = xi1, xi2 = xi2, p1 = p1, p2 = p2)
Calculates offspring genotype frequencies under the three-parameter model.
offspring_gf_3(tau, beta, gamma1, gamma2 = gamma1, p1, p2)offspring_gf_3(tau, beta, gamma1, gamma2 = gamma1, p1, p2)
tau |
Probability of quadrivalent formation |
beta |
Probability of double reduction given quadrivalent formation |
gamma1 |
Probability of AA_aa pairing in parent 1 |
gamma2 |
Probability of AA_aa pairing in parent 2 |
p1 |
The first parent's genotype |
p2 |
The second parent's genotype |
Offspring genotype frequencies
David Gerard
offspring_gf_3( tau = 1/2, beta = 1/6, gamma1 = 1/3, gamma2 = 1/3, p1 = 1, p2 = 2)offspring_gf_3( tau = 1/2, beta = 1/6, gamma1 = 1/3, gamma2 = 1/3, p1 = 1, p2 = 2)
This is experimental. I haven't tested it out in lots of scenarios yet.
otest_g( x, g1, g2, pbad = 0.03, drbound = 1/6, pp = TRUE, dr = TRUE, alpha = 0, xi1 = 1/3, xi2 = 1/3 )otest_g( x, g1, g2, pbad = 0.03, drbound = 1/6, pp = TRUE, dr = TRUE, alpha = 0, xi1 = 1/3, xi2 = 1/3 )
x |
A vector of genotype counts. |
g1 |
The genotype of parent 1. |
g2 |
The genotype of parent 2. |
pbad |
The upper bound on the number of bad genotypes |
drbound |
The maximum rate of double reduction. A default of 1/6 is provided, which is the rate under the complete equational segregation model of meiosis. |
pp |
A logical. Should we account for preferential pairing
( |
dr |
A logical. Should we account for double reduction
( |
alpha |
If |
xi1 |
If |
xi2 |
If |
Here, we test if the compatible genotypes are consistent with F1 populations and separately test that the number of incompatible genotypes isn't too large (less than 3 percent by default). This is the strategy the polymapR software uses. But we use a Bonferroni correction to combine these tests (minimum of two times the p-values), while they just multiply the p-values together. So our approach accounts for double reduction and preferential pairing, while also controlling the family-wise error rate.
A list with the following elements
statisticThe log-likelihood ratio test statistic.
dfThe degrees of freedom.
p_valueThe Bonferroni corrected p-value.
p_lrtThe p-value of the LRT.
p_binomThe p-value of the one-sided binomial test.
alphaThe estimated double reduction rate.
xi1The estimated preferential pairing parameter of parent 1.
xi2The estimated preferential pairing parameter of parent 2.
David Gerard
# Run a test where genotypes 0, 1, and 2 are possible x <- c(10, 10, 4, 0, 5) otest_g(x = x, g1 = 1, g2 = 0) # polymapR's multiplication and the Bonferroni differ df <- expand.grid(p1 = seq(0, 1, length.out = 20), p2 = seq(0, 1, length.out = 20)) df$polymapr <- NA df$bonferroni <- NA for (i in seq_len(nrow(df))) { df$polymapr[[i]] <- df$p1[[i]] * df$p2[[i]] df$bonferroni[[i]] <- 2 * min(c(df$p1[[i]], df$p2[[i]], 0.5)) } graphics::plot(df$polymapr, df$bonferroni)# Run a test where genotypes 0, 1, and 2 are possible x <- c(10, 10, 4, 0, 5) otest_g(x = x, g1 = 1, g2 = 0) # polymapR's multiplication and the Bonferroni differ df <- expand.grid(p1 = seq(0, 1, length.out = 20), p2 = seq(0, 1, length.out = 20)) df$polymapr <- NA df$bonferroni <- NA for (i in seq_len(nrow(df))) { df$polymapr[[i]] <- df$p1[[i]] * df$p2[[i]] df$bonferroni[[i]] <- 2 * min(c(df$p1[[i]], df$p2[[i]], 0.5)) } graphics::plot(df$polymapr, df$bonferroni)
Takes as input (i) the parent genotypes, (ii) the offspring genotype frequency, (iii) sequencing error rate, (iv) read depth, (v) bias, and (vi) overdispersion. It returns genotype likelihoods.
po_gl( genovec, ploidy, p1_geno = NULL, p2_geno = NULL, rd = 10, seq = 0.01, bias = 1, od = 0.01 )po_gl( genovec, ploidy, p1_geno = NULL, p2_geno = NULL, rd = 10, seq = 0.01, bias = 1, od = 0.01 )
genovec |
Offspring genotypes. |
ploidy |
Ploidy |
p1_geno |
Parent 1 genotype |
p2_geno |
Parent 2 genotype |
rd |
Read depth. Lower is more uncertain. |
seq |
Sequencing error rate. Higher means more uncertain. |
bias |
Bias. 1 means no bias. |
od |
Overdispersion. Typical value is like 0.01. Higher means more uncertain. |
Genotype likelihoods
Mira Thakkar
set.seed(1) po_gl(genovec = c(1, 2, 1, 1, 3), p1_geno = 2, p2_geno = 2, ploidy = 4)set.seed(1) po_gl(genovec = c(1, 2, 1, 1, 3), p1_geno = 2, p2_geno = 2, ploidy = 4)
The polymapR package tests for segregation distortion by iterating through all
possible forms of disomic or polysomic inheritance from either parent,
tests for concordance of the offspring genotypes using a chi-squared
test, and returns the largest p-value. It sometimes chooses a different
p-value based on other heuristics. They also sometimes return NA.
When type = "segtest", we only look at patterns of the
given parent genotypes, choosing the largest p-value. When
type = "polymapR", we return what they use via their heuristics.
polymapr_test(x, g1 = NULL, g2 = NULL, type = c("segtest", "polymapR"))polymapr_test(x, g1 = NULL, g2 = NULL, type = c("segtest", "polymapR"))
x |
Either a vector of genotype counts, or a matrix of genotype posteriors where the rows index the individuals and the columns index the genotypes. |
g1 |
Parent 1's genotype. |
g2 |
Parent 2's genotype. |
type |
Either my implementation which approximates that of
polymapR ( |
A list with the following elements:
The p-value of the test.
The genotype frequencies of the best fit model.
The frequency of invalid genotypes.
The p-value of the invalid proportion.
David Gerard
checkF1().
g1 <- 0 g2 <- 1 x <- c(4, 16, 0, 0, 0) polymapr_test(x = x, g1 = g1, g2 = g2, type = "segtest") polymapr_test(x = x, g1 = g1, g2 = g2, type = "polymapR")g1 <- 0 g2 <- 1 x <- c(4, 16, 0, 0, 0) polymapr_test(x = x, g1 = g1, g2 = g2, type = "segtest") polymapr_test(x = x, g1 = g1, g2 = g2, type = "polymapR")
This is under the two parameter model.
pvec_tet_2(alpha, xi, ell)pvec_tet_2(alpha, xi, ell)
alpha |
The double reduction rate |
xi |
The preferential pairing parameter |
ell |
The parental genotype |
The gamete genotype frequencies
Mira Thakkar and David Gerard
alpha <- 1/6 xi <- 1/3 pvec_tet_2(alpha = alpha, xi = xi, ell = 0) pvec_tet_2(alpha = alpha, xi = xi, ell = 1) pvec_tet_2(alpha = alpha, xi = xi, ell = 2) pvec_tet_2(alpha = alpha, xi = xi, ell = 3) pvec_tet_2(alpha = alpha, xi = xi, ell = 4)alpha <- 1/6 xi <- 1/3 pvec_tet_2(alpha = alpha, xi = xi, ell = 0) pvec_tet_2(alpha = alpha, xi = xi, ell = 1) pvec_tet_2(alpha = alpha, xi = xi, ell = 2) pvec_tet_2(alpha = alpha, xi = xi, ell = 3) pvec_tet_2(alpha = alpha, xi = xi, ell = 4)
This is under the three parameter model.
pvec_tet_3(tau, beta, gamma, ell)pvec_tet_3(tau, beta, gamma, ell)
tau |
Probability of quadrivalent formation |
beta |
Probability of double reduction given quadrivalent formation |
gamma |
Probability of AA/aa pairing given bivalent formation |
ell |
The parent genotype |
The gamete genotype frequencies
David Gerard
pvec_tet_3(tau = 0.5, beta = 0.1, gamma = 0.5, ell = 2)pvec_tet_3(tau = 0.5, beta = 0.1, gamma = 0.5, ell = 2)
Gamete frequencies for all possible disomic and polysomic segregation patterns for even ploidies 2 through 20. If you need higher ploidy levels, let me know and I'll update it (it's very easy).
segseg
A data frame with the following columns
ploidyThe ploidy of the parent.
gThe genotype of the parent.
mThe pairing configuration given disomic inheritance. See Gerard et al (2018).
pThe gamete frequencies. Element p[[i]] is the probability a gamete will have dosage i-1.
modeWhether the inheritance pattern that leads to these gamete frequencies is "disomic", "polysomic", or "both".
David Gerard
Gerard, D., Ferrão, L. F. V., Garcia, A. A. F., & Stephens, M. (2018). Genotyping polyploids from messy sequencing data. Genetics, 210(3), 789-807. doi:10.1534/genetics.118.301468
Provides tests for segregation distortion for an F1 population of polyploids under various models of meiosis. You can use this test for autopolyploids that exhibit full polysomic inheritance, allopolyploids that exhibit full disomic inheritance, or segmental allopolyploids that exhibit partial preferential pairing. Double reduction is (optionally) fully accounted for in tetraploids, and (optionally) partially accounted for (only at simplex loci) for higher ploidies. Some maximum proportion of outliers can be specified (default at 3%), and so this method can accommodate moderate levels of double reduction at non-simplex loci. Offspring genotypes can either be known, or genotype uncertainty can be represented through genotype likelihoods. Parent data may or may not be provided, at your option. Parents can have different (even) ploidies, at your option. Details of the methods may be found in Gerard et al. (2025).
seg_lrt( x, p1_ploidy, p2_ploidy = p1_ploidy, p1 = NULL, p2 = NULL, model = c("seg", "auto", "auto_dr", "allo", "allo_pp", "auto_allo"), outlier = TRUE, ret_out = FALSE, ob = 0.03, db = c("ces", "prcs"), ntry = 3, opt = c("bobyqa", "L-BFGS-B"), optg = c("NLOPT_GN_MLSL_LDS", "NLOPT_GN_ESCH", "NLOPT_GN_CRS2_LM", "NLOPT_GN_ISRES"), df_tol = 0.001, chisq = FALSE )seg_lrt( x, p1_ploidy, p2_ploidy = p1_ploidy, p1 = NULL, p2 = NULL, model = c("seg", "auto", "auto_dr", "allo", "allo_pp", "auto_allo"), outlier = TRUE, ret_out = FALSE, ob = 0.03, db = c("ces", "prcs"), ntry = 3, opt = c("bobyqa", "L-BFGS-B"), optg = c("NLOPT_GN_MLSL_LDS", "NLOPT_GN_ESCH", "NLOPT_GN_CRS2_LM", "NLOPT_GN_ISRES"), df_tol = 0.001, chisq = FALSE )
x |
The data. Can be one of two forms:
|
p1_ploidy, p2_ploidy
|
The ploidy of the first or second parent. Should be even. |
p1, p2
|
One of three forms:
|
model |
One of six forms:
|
outlier |
A logical. Should we allow for outliers ( |
ret_out |
A logical. Should we return the probability that each individual is an outlier ( |
ob |
The default upper bound on the outlier proportion. |
db |
Should we use the complete equational segregation model ( |
ntry |
The number of times to try the optimization. You probably do not want to touch this. |
opt |
For local optimization, should we use bobyqa (Powell, 2009) or L-BFGS-B (Byrd et al, 1995)? You probably do not want to touch this. |
optg |
Initial global optimization used to start local optimization. Methods are described in the
|
df_tol |
Threshold for the rank of the Jacobian for the degrees of freedom calculation. This accounts for weak identifiability in the null model. You probably do not want to touch this. |
chisq |
A logical. When using known genotypes, this flags to use
the chi-squared test or the Likelihood Ratio Test. Default is |
A list with some or all of the following elements
statThe test statistic.
dfThe degrees of freedom of the test.
p_valueThe p-value of the test.
null_bicThe null model's BIC.
outprobOutlier probabilities. Only returned in ret_out = TRUE.
If using genotype counts, element i is the probability that an individual with genotype i-1 is an outlier. So the return vector has length ploidy plus 1.
If using genotype log-likelihoods, element i is the probability that individual i is an outlier. So the return vector has the same length as the number of individuals.
These outlier probabilities are only valid if the null of no segregation is true.
nullA list with estimates and information on the null model.
l0_ppMaximized likelihood under the null plus the parent log-likelihoods.
l0Maximized likelihood under using estimated parent genotypes are known parent genotypes.
q0Estimated genotype frequencies under the null.
df0Estimated number of parameters under the null.
gamA list of three lists with estimates of the model
parameters. The third list contains the elements outlier
(which is TRUE if outliers were modeled) and pi
(the estimated outlier proportion). The first two lists contain
information on each parent with the following elements:
ploidyThe ploidy of the parent.
gThe (estimated) genotype of the parent.
alphaThe estimated double reduction rate(s). alpha[i] is the estimated probability that a gamete has i copies of identical by double reduction alleles.
betaDouble reduction's effect on simplex loci when type = "mix" and add_dr = TRUE.
gammaThe mixing proportions for the pairing configurations. The order is the same as in seg.
typeEither "mix" or "polysomic"
add_drDid we model double reduction at simplex loci when using type = "mix" (TRUE) or not (FALSE)?
altA list with estimates and information on the alternative model.
l1The maximized likelihood under the alternative.
q1The estimated genotype frequencies under the alternative.
df1The estimated number of parameters under the alternative.
The gamete frequencies under the null model can be calculated via
gamfreq(). The genotype frequencies, which are just
a discrete linear convolution (convolve()) of the
gamete frequencies, can be calculated via gf_freq().
The null model's gamete frequencies for true autopolyploids
(model = "auto") or
true allopolyploids (model = "allo") are given in the seg data frame
that comes with this package. I only made that data frame go up to
ploidy 20, but let me know if you need it for higher ploidies.
The polyRAD folks test for full autopolyploid and full allopolyploid, so I
included that as an option (model = "auto_allo").
We can account for arbitrary levels of double reduction in autopolyploids
(model = "auto_dr") using the gamete frequencies from
Huang et al (2019).
The null model for segmental allopolyploids (model = "allo_pp") is the mixture model of
the possible allopolyploid gamete frequencies. The autopolyploid model
(without double reduction) is a subset of this mixture model.
In the above mixture model, we can account for double reduction for simplex
loci (model = "seg") by just slightly reducing the
number of simplex gametes and increasing the number of duplex and
nullplex gametes. That is, the frequencies for (nullplex, simplex, duplex)
gametes go from (0.5, 0.5, 0) to
(0.5 + b, 0.5 - 2 * b, b).
model = "seg" is the most general, so it is the default. But you
should use other models if you have more information on your species. E.g.
if you know you have an autopolyploid, use either model = "auto"
or model = "auto_dr".
Do NOT interpret the estimated parameters in the null$gam list.
These parameters are weakly identified (I had to do some fancy
spectral methods to account for this in the null distribution
of the tests). Even though they are technically identified, you would
need a massive data set to be able to estimate them accurately.
David Gerard
Byrd, R. H., Lu, P., Nocedal, J., & Zhu, C. (1995). A limited memory algorithm for bound constrained optimization. SIAM Journal on scientific computing, 16(5), 1190-1208. doi:10.1137/0916069
da Silva Santos, C. H., Goncalves, M. S., & Hernandez-Figueroa, H. E. (2010). Designing novel photonic devices by bio-inspired computing. IEEE Photonics Technology Letters, 22(15), 1177-1179. doi:10.1109/LPT.2010.2051222
Gerard, D, Ambrosano, GB, Pereira, GdS, & Garcia, AAF (2025). Tests for segregation distortion in higher ploidy F1 populations. G3: Genes | Genomes | Genetics, 15(11), jkaf212. doi:10.1093/g3journal/jkaf212
Huang, K., Wang, T., Dunn, D. W., Zhang, P., Cao, X., Liu, R., & Li, B. (2019). Genotypic frequencies at equilibrium for polysomic inheritance under double-reduction. G3: Genes, Genomes, Genetics, 9(5), 1693-1706. doi:10.1534/g3.119.400132
Johnson S (2008). The NLopt nonlinear-optimization package. https://github.com/stevengj/nlopt.
Kaelo, P., & Ali, M. M. (2006). Some variants of the controlled random search algorithm for global optimization. Journal of optimization theory and applications, 130, 253-264. doi:10.1007/s10957-006-9101-0
Kucherenko, S., & Sytsko, Y. (2005). Application of deterministic low-discrepancy sequences in global optimization. Computational Optimization and Applications, 30, 297-318. doi:10.1007/s10589-005-4615-1
Powell, M. J. D. (2009), The BOBYQA algorithm for bound constrained optimization without derivatives, Report No. DAMTP 2009/NA06, Centre for Mathematical Sciences, University of Cambridge, UK.
Runarsson, T. P., & Yao, X. (2005). Search biases in constrained evolutionary optimization. IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews), 35(2), 233-243. doi:10.1109/TSMCC.2004.841906
set.seed(1) p1_ploidy <- 4 p1 <- 1 p2_ploidy <- 8 p2 <- 4 q <- gf_freq( p1_g = p1, p1_ploidy = p1_ploidy, p1_gamma = 1, p1_type = "mix", p2_g = p2, p2_ploidy = p2_ploidy, p2_gamma= c(0.2, 0.2, 0.6), p2_type = "mix", pi = 0.01) nvec <- c(stats::rmultinom(n = 1, size = 200, prob = q)) gl <- simgl(nvec = nvec) seg_lrt(x = nvec, p1_ploidy = p1_ploidy, p2_ploidy = p2_ploidy, p1 = p1, p2 = p2)$p_value seg_lrt(x = gl, p1_ploidy = p1_ploidy, p2_ploidy = p2_ploidy, p1 = p1, p2 = p2)$p_valueset.seed(1) p1_ploidy <- 4 p1 <- 1 p2_ploidy <- 8 p2 <- 4 q <- gf_freq( p1_g = p1, p1_ploidy = p1_ploidy, p1_gamma = 1, p1_type = "mix", p2_g = p2, p2_ploidy = p2_ploidy, p2_gamma= c(0.2, 0.2, 0.6), p2_type = "mix", pi = 0.01) nvec <- c(stats::rmultinom(n = 1, size = 200, prob = q)) gl <- simgl(nvec = nvec) seg_lrt(x = nvec, p1_ploidy = p1_ploidy, p2_ploidy = p2_ploidy, p1 = p1, p2 = p2)$p_value seg_lrt(x = gl, p1_ploidy = p1_ploidy, p2_ploidy = p2_ploidy, p1 = p1, p2 = p2)$p_value
Uses the future package to implement parallelization support for
the likelihood ratio tests for segregation distortion. Details of
this test are provided in the seg_lrt() function's
documentation. See Gerard et al. (2025) for details of the methods.
seg_multi( g, p1_ploidy, p2_ploidy = p1_ploidy, p1 = NULL, p2 = NULL, model = c("seg", "auto", "auto_dr", "allo", "allo_pp", "auto_allo"), outlier = TRUE, ret_out = FALSE, ob = 0.03, db = c("ces", "prcs"), ntry = 3, df_tol = 0.001 )seg_multi( g, p1_ploidy, p2_ploidy = p1_ploidy, p1 = NULL, p2 = NULL, model = c("seg", "auto", "auto_dr", "allo", "allo_pp", "auto_allo"), outlier = TRUE, ret_out = FALSE, ob = 0.03, db = c("ces", "prcs"), ntry = 3, df_tol = 0.001 )
g |
One of two inputs
|
p1_ploidy, p2_ploidy
|
The ploidy of the first or second parent. Should be even. |
p1 |
One of three inputs
|
p2 |
One of three inputs
|
model |
One of six forms:
|
outlier |
A logical. Should we allow for outliers ( |
ret_out |
A logical. Should we return the probability that each individual is an outlier ( |
ob |
The default upper bound on the outlier proportion. |
db |
Should we use the complete equational segregation model ( |
ntry |
The number of times to try the optimization. You probably do not want to touch this. |
df_tol |
Threshold for the rank of the Jacobian for the degrees of freedom calculation. This accounts for weak identifiability in the null model. You probably do not want to touch this. |
A data frame with the following elements:
statisticThe likelihood ratio test statistic
p_valueThe p-value of the likelihood ratio test.
dfThe (estimated) degrees of freedom of the test.
null_bicThe BIC of the null model (no segregation distortion).
df0The (estimated) number of parameters under null.
df1The (estimated) number of parameters under the alternative.
p1The (estimated) genotype of parent 1.
p2The (estimated) genotype of parent 2.
q0The MLE of the genotype frequencies under the null.
q1The MLE of the genotype frequencies under the alternative.
outprobOutlier probabilities. Only returned in ret_out = TRUE.
If using genotype counts, element i is the probability that an individual with genotype i-1 is an outlier. So the return vector has length ploidy plus 1.
If using genotype log-likelihoods, element i is the probability that individual i is an outlier. So the return vector has the same length as the number of individuals.
These outlier probabilities are only valid if the null of no segregation is true.
Note that since this data frame contains the list-columns q0 and
q1, you cannot use write.csv() to save it.
You have to either remove those columns first or use something
like saveRDS()
The seg_multi() function supports parallel computing. It does
so through the future
package.
You first specify the evaluation plan with plan()
from the future package. On a local machine, this is typically
just future::plan(future::multisession, workers = nc) where
nc is the number of workers you want. You can find the maximum
number of possible workers with availableCores().
You then run seg_multi(), then shut down the workers with
future::plan(future::sequential). The pseudo code is
future::plan(future::multisession, workers = nc) seg_multi() future::plan(future::sequential)
The gamete frequencies under the null model can be calculated via
gamfreq(). The genotype frequencies, which are just
a discrete linear convolution (convolve()) of the
gamete frequencies, can be calculated via gf_freq().
The null model's gamete frequencies for true autopolyploids
(model = "auto") or
true allopolyploids (model = "allo") are given in the seg data frame
that comes with this package. I only made that data frame go up to
ploidy 20, but let me know if you need it for higher ploidies.
The polyRAD folks test for full autopolyploid and full allopolyploid, so I
included that as an option (model = "auto_allo").
We can account for arbitrary levels of double reduction in autopolyploids
(model = "auto_dr") using the gamete frequencies from
Huang et al (2019).
The null model for segmental allopolyploids (model = "allo_pp") is the mixture model of
the possible allopolyploid gamete frequencies. The autopolyploid model
(without double reduction) is a subset of this mixture model.
In the above mixture model, we can account for double reduction for simplex
loci (model = "seg") by just slightly reducing the
number of simplex gametes and increasing the number of duplex and
nullplex gametes. That is, the frequencies for (nullplex, simplex, duplex)
gametes go from (0.5, 0.5, 0) to
(0.5 + b, 0.5 - 2 * b, b).
model = "seg" is the most general, so it is the default. But you
should use other models if you have more information on your species. E.g.
if you know you have an autopolyploid, use either model = "auto"
or model = "auto_dr".
Do NOT interpret the estimated parameters in the null$gam list.
These parameters are weakly identified (I had to do some fancy
spectral methods to account for this in the null distribution
of the tests). Even though they are technically identified, you would
need a massive data set to be able to estimate them accurately.
David Gerard
Gerard, D, Ambrosano, GB, Pereira, GdS, & Garcia, AAF (2025). Tests for segregation distortion in higher ploidy F1 populations. G3: Genes | Genomes | Genetics, 15(11), jkaf212. doi:10.1093/g3journal/jkaf212
seg_lrt() Single locus LRT for segregation distortion.
gamfreq() Gamete frequencies under various models of meiosis
gf_freq() F1 genotype frequencies under various models of meiosis.
multidog_to_g() Converts the output of updog::multidog() into something that you can input into seg_multi().
## Assuming genotypes are known (typically a bad idea) glist <- multidog_to_g( mout = ufit, ploidy = 4, type = "all_g", p1 = "indigocrisp", p2 = "sweetcrisp") p1_1 <- glist$p1 p2_1 <- glist$p2 g_1 <- glist$g s1 <- seg_multi( g = g_1, p1_ploidy = 4, p2_ploidy = 4, p1 = p1_1, p2 = p2_1) s1[, c("snp", "p_value")] ## Put NULL if you have absolutely no information on the parents s2 <- seg_multi(g = g_1, p1_ploidy = 4, p2_ploidy = 4, p1 = NULL, p2 = NULL) s2[, c("snp", "p_value")] ## Using genotype likelihoods (typically a good idea) ## Also demonstrate parallelization through future package. glist <- multidog_to_g( mout = ufit, ploidy = 4, type = "all_gl", p1 = "indigocrisp", p2 = "sweetcrisp") p1_2 <- glist$p1 p2_2 <- glist$p2 g_2 <- glist$g # future::plan(future::multisession, workers = 2) # s3 <- seg_multi( # g = g_2, # p1_ploidy = 4, # p2_ploidy = 4, # p1 = p1_2, # p2 = p2_2, # ret_out = TRUE) # future::plan(future::sequential) # s3[, c("snp", "p_value")] ## Outlier probabilities are returned if `ret_out = TRUE` # graphics::plot(s3$outprob[[6]], ylim = c(0, 1))## Assuming genotypes are known (typically a bad idea) glist <- multidog_to_g( mout = ufit, ploidy = 4, type = "all_g", p1 = "indigocrisp", p2 = "sweetcrisp") p1_1 <- glist$p1 p2_1 <- glist$p2 g_1 <- glist$g s1 <- seg_multi( g = g_1, p1_ploidy = 4, p2_ploidy = 4, p1 = p1_1, p2 = p2_1) s1[, c("snp", "p_value")] ## Put NULL if you have absolutely no information on the parents s2 <- seg_multi(g = g_1, p1_ploidy = 4, p2_ploidy = 4, p1 = NULL, p2 = NULL) s2[, c("snp", "p_value")] ## Using genotype likelihoods (typically a good idea) ## Also demonstrate parallelization through future package. glist <- multidog_to_g( mout = ufit, ploidy = 4, type = "all_gl", p1 = "indigocrisp", p2 = "sweetcrisp") p1_2 <- glist$p1 p2_2 <- glist$p2 g_2 <- glist$g # future::plan(future::multisession, workers = 2) # s3 <- seg_multi( # g = g_2, # p1_ploidy = 4, # p2_ploidy = 4, # p1 = p1_2, # p2 = p2_2, # ret_out = TRUE) # future::plan(future::sequential) # s3[, c("snp", "p_value")] ## Outlier probabilities are returned if `ret_out = TRUE` # graphics::plot(s3$outprob[[6]], ylim = c(0, 1))
Simulate genotype counts from F1 individuals
simf1g(n, g1, g2, alpha = 0, xi1 = 1/3, xi2 = 1/3)simf1g(n, g1, g2, alpha = 0, xi1 = 1/3, xi2 = 1/3)
n |
Sample size. |
g1 |
The first parent's genotype. |
g2 |
The second parent's genotype. |
alpha |
The double reduction rate. |
xi1 |
The first parent's preferential pairing parameter. |
xi2 |
The second parent's preferential pairing parameter. |
A vector of counts, where element i is the number of
simulated individuals with genotype i-1.
David Gerard
set.seed(1) simf1g(n = 10, g1 = 1, g2 = 2)set.seed(1) simf1g(n = 10, g1 = 1, g2 = 2)
Simulate genotype likelihoods of F1 individuals.
simf1gl(n, g1, g2, rd = 10, alpha = 0, xi1 = 1/3, xi2 = 1/3)simf1gl(n, g1, g2, rd = 10, alpha = 0, xi1 = 1/3, xi2 = 1/3)
n |
Sample size. |
g1 |
The first parent's genotype. |
g2 |
The second parent's genotype. |
rd |
The read depth. |
alpha |
The double reduction rate. |
xi1 |
The first parent's preferential pairing parameter. |
xi2 |
The second parent's preferential pairing parameter. |
The matrix of offspring genotype log-likelihoods.
David Gerard
set.seed(1) simf1gl(n = 10, g1 = 1, g2 = 2)set.seed(1) simf1gl(n = 10, g1 = 1, g2 = 2)
Provide a vector of genotype counts and this will return a matrix of genotype log-likelihoods.
simgl(nvec, rd = 10, seq = 0.01, bias = 1, od = 0.01)simgl(nvec, rd = 10, seq = 0.01, bias = 1, od = 0.01)
nvec |
A vector of counts. |
rd |
Read depth. Lower is more uncertain. |
seq |
Sequencing error rate. Higher means more uncertain. |
bias |
Bias. 1 means no bias. |
od |
Overdispersion. Typical value is like 0.01. Higher means more uncertain. |
A matrix of genotype log-likelihoods. The rows index the individuals and the columns index the genotypes. This is natural log (base e).
David Gerard
set.seed(1) simgl(nvec = c(1, 2, 1, 1, 3))set.seed(1) simgl(nvec = c(1, 2, 1, 1, 3))
Convert from three parameters to two parameters
three_to_two(tau, beta, gamma)three_to_two(tau, beta, gamma)
tau |
Probability of quadrivalent formation |
beta |
Probability of double reduction given quadrivalent formation |
gamma |
Probability of AA/aa pairing given bivalent formation |
A vector of length two. The first is the double reduction rate
(alpha), and the second is the preferential pairing parameter
(xi).
David Gerard
three_to_two(tau = 0.1, beta = 1/6, gamma = 1/4)three_to_two(tau = 0.1, beta = 1/6, gamma = 1/4)
A subset of data from Cappai et al. (2020), fit using
multidog(). This just contains a random set of 10 loci.
ufit ufit2 ufit3ufit ufit2 ufit3
An object of type multidog output from multidog().
ufitUses the model = "norm" option.
ufit2Uses the model = "f1pp" option.
ufit3Uses the model = "f1" option.
An object of class multidog of length 2.
An object of class multidog of length 2.
Cappai, F., Amadeu, R. R., Benevenuto, J., Cullen, R., Garcia, A., Grossman, A., Ferrão, L., & Munoz, P. (2020). High-resolution linkage map and QTL analyses of fruit firmness in autotetraploid blueberry. Frontiers in plant science, 11, 562171. doi:10.3389/fpls.2020.562171.