Title: | Tests for Segregation Distortion in Polyploids |
---|---|
Description: | Provides a suite of tests for segregation distortion in F1 polyploid populations (for now, just tetraploids). This is under different assumptions of meiosis. Details of these methods are described in Gerard et al. (2024) <doi:10.1101/2024.02.07.579361>. This material is based upon work supported by the National Science Foundation under Grant No. 2132247. The opinions, findings, and conclusions or recommendations expressed are those of the author and do not necessarily reflect the views of the National Science Foundation. |
Authors: | David Gerard [aut, cre] , Mira Thakkar [aut], NSF DBI 2132247 [fnd] (https://www.nsf.gov/awardsearch/showAward?AWD_ID=2132247) |
Maintainer: | David Gerard <[email protected]> |
License: | GPL (>= 3) |
Version: | 1.0.1 |
Built: | 2024-12-21 05:12:06 UTC |
Source: | https://github.com/dcgerard/segtest |
This chi-squared test is run under the assumption of no double reduction and no preferential pairing.
chisq_g4(x, g1, g2)
chisq_g4(x, g1, g2)
x |
Vector of observed genotype counts |
g1 |
Parent 1's genotype |
g2 |
Parent 2's genotype |
A list containing the chi-squared statistic, degrees of freedom, and p-value.
Mira Thakkar and David Gerard
x <- c(1, 2, 4, 3, 0) g1 <- 2 g2 <- 2 chisq_g4(x, g1, g2) x <- c(10, 25, 10, 0, 0) g1 <- 1 g2 <- 1 chisq_g4(x, g1, g2)
x <- c(1, 2, 4, 3, 0) g1 <- 2 g2 <- 2 chisq_g4(x, g1, g2) x <- c(10, 25, 10, 0, 0) g1 <- 1 g2 <- 1 chisq_g4(x, g1, g2)
Calculates the MLE genotype and runs a chi-squared test assuming no double reduction and no preferential pairing.
chisq_gl4(gl, g1, g2)
chisq_gl4(gl, g1, g2)
gl |
A matrix of offspring genotype log-likelihoods. The rows index the
individuals and the columns index the possible genotypes. So
|
g1 |
The first parent's genotype. |
g2 |
The second parent's genotype. |
A list containing the chi-squared statistic, degrees of freedom, and p-value.
Mira Thakkar and David Gerard
## null sim set.seed(1) g1 <- 2 g2 <- 2 gl <- simf1gl(n = 25, g1 = g1, g2 = g2, alpha = 0, xi2 = 1/3) chisq_gl4(gl = gl, g1 = g1, g2 = g2)
## null sim set.seed(1) g1 <- 2 g2 <- 2 gl <- simf1gl(n = 25, g1 = g1, g2 = g2, alpha = 0, xi2 = 1/3) chisq_gl4(gl = gl, g1 = g1, g2 = g2)
EM algorithm to estimate prior genotype probabilities from genotype likelihoods.
em_li(B, itermax = 100L, eps = 1e-05)
em_li(B, itermax = 100L, eps = 1e-05)
B |
Matrix of genotype log-likelihoods. The rows index the individuals and the columns index the genotypes. |
itermax |
The maximum number of iterations. |
eps |
The stopping criteria. |
A vector of log prior probabilities for each genotype.
David Gerard
Li, H. (2011). A statistical framework for SNP calling, mutation discovery, association mapping and population genetical parameter estimation from sequencing data. Bioinformatics, 27(21), 2987-2993. doi:10.1093/bioinformatics/btr509
# Simulate some data set.seed(1) gl <- simgl(nvec = c(3, 2, 4, 1, 2)) # Run em lprob <- em_li(B = gl) # Exponentiate to get probabilities prob <- exp(c(lprob)) prob
# Simulate some data set.seed(1) gl <- simgl(nvec = c(3, 2, 4, 1, 2)) # Run em lprob <- em_li(B = gl) # Exponentiate to get probabilities prob <- exp(c(lprob)) prob
Converts genotype counts to genotype vectors.
gcount_to_gvec(gcount)
gcount_to_gvec(gcount)
gcount |
The vector of genotype counts. |
A vector of length sum(gcount)
, containing gcount[1]
copies of 0
, gcount[2]
copies of 1
, gcount[3]
copies of 2
, etc.
David Gerard
gcount <- c(1, 2, 3, 0, 5) gcount_to_gvec(gcount = gcount)
gcount <- c(1, 2, 3, 0, 5) gcount_to_gvec(gcount = gcount)
gcount_to_gvec()
.Inverse function of gcount_to_gvec()
.
gvec_to_gcount(gvec, ploidy = 4)
gvec_to_gcount(gvec, ploidy = 4)
gvec |
The vector of genotypes. |
ploidy |
The ploidy of the species. |
A vector of counts. Element k
is the number of individuals
with genotype k-1
.
David Gerard
gvec <- c(1, 2, 3, 2, 3, 1, 4, 0, 1, 0, 0, 1, 0, 0) gvec_to_gcount(gvec = gvec)
gvec <- c(1, 2, 3, 2, 3, 1, 4, 0, 1, 0, 0, 1, 0, 0) gvec_to_gcount(gvec = gvec)
There is a dependence on the bounds of two-parameter model. This function returns TRUE if those bounds are satisfied and FALSE otherwise.
is_valid_2(dr, pp, drbound = 1/6)
is_valid_2(dr, pp, drbound = 1/6)
dr |
The double reduction rate. |
pp |
The preferential pairing parameter. |
drbound |
The maximum double reduction rate possible. |
TRUE
if the model is valid, FALSE
otherwise.
David Gerard
TOL <- 1e-6 is_valid_2(dr = 1/6, pp = 1/3, drbound = 1/6) # Valid is_valid_2(dr = 1/6, pp = 1/3 - TOL, drbound = 1/6) # Not valid is_valid_2(dr = 1/6, pp = 1/3 + TOL, drbound = 1/6) # Not valid
TOL <- 1e-6 is_valid_2(dr = 1/6, pp = 1/3, drbound = 1/6) # Valid is_valid_2(dr = 1/6, pp = 1/3 - TOL, drbound = 1/6) # Not valid is_valid_2(dr = 1/6, pp = 1/3 + TOL, drbound = 1/6) # Not valid
Iterator over array
## S3 method for class 'array' iter(obj, by = 1, recycle = FALSE, ...)
## S3 method for class 'array' iter(obj, by = 1, recycle = FALSE, ...)
obj |
An array. |
by |
The dimension to iterate over. |
recycle |
Should the iterator reset? |
... |
not used |
An iterator. This is an S3 arrayiter object, used in conjunction with nextElem to iterate over one index of an array.
David Gerard
glist <- multidog_to_g(mout = ufit, type = "all_gl", p1 = "indigocrisp", p2 = "sweetcrisp") g <- iterators::iter(glist$g, by = 3) head(iterators::nextElem(g)) head(iterators::nextElem(g)) head(iterators::nextElem(g))
glist <- multidog_to_g(mout = ufit, type = "all_gl", p1 = "indigocrisp", p2 = "sweetcrisp") g <- iterators::iter(glist$g, by = 3) head(iterators::nextElem(g)) head(iterators::nextElem(g)) head(iterators::nextElem(g))
This is under the two parameter model.
like_gknown_2(x, alpha, xi1, xi2, g1, g2, log_p = TRUE, pen = 0)
like_gknown_2(x, alpha, xi1, xi2, g1, g2, log_p = TRUE, pen = 0)
x |
A vector of length 5. |
alpha |
The double reduction rate. |
xi1 |
The preferential pairing parameter of parent 1. |
xi2 |
The preferential pairing parameter of parent 2. |
g1 |
Parent 1's genotype. |
g2 |
Parent 2's genotype. |
log_p |
A logical. Should we return the log likelihood or not? |
pen |
A tiny penalty to help with numerical stability |
The (log) likelihood.
David Gerard
x <- c(1, 4, 5, 3, 1) alpha <- 0.01 xi1 <- 0.5 xi2 <- 0.3 g1 <- 1 g2 <- 2 like_gknown_2( x = x, alpha = alpha, xi1 = xi1, xi2 = xi2, g1 = g1, g2 = g2)
x <- c(1, 4, 5, 3, 1) alpha <- 0.01 xi1 <- 0.5 xi2 <- 0.3 g1 <- 1 g2 <- 2 like_gknown_2( x = x, alpha = alpha, xi1 = xi1, xi2 = xi2, g1 = g1, g2 = g2)
This is under the three parameter model.
like_gknown_3(x, tau, beta, gamma1, gamma2, g1, g2, log_p = TRUE, pen = 0)
like_gknown_3(x, tau, beta, gamma1, gamma2, g1, g2, log_p = TRUE, pen = 0)
x |
A vector of length 5. |
tau |
The probability of quadrivalent formation. |
beta |
The probability of double reduction given quadrivalent formation. |
gamma1 |
The probability of AA_aa pairing for parent 1. |
gamma2 |
The probability of AA_aa pairing for parent 2. |
g1 |
Parent 1's genotype. |
g2 |
Parent 2's genotype. |
log_p |
A logical. Should we return the log likelihood or not? |
pen |
A tiny penalty to help with numerical stability |
The (log) likelihood.
David Gerard
x <- c(1, 4, 5, 3, 1) tau <- 0.5 beta <- 0.1 gamma1 <- 0.5 gamma2 <- 0.3 g1 <- 1 g2 <- 2 like_gknown_3( x = x, tau = tau, beta = beta, gamma1 = gamma1, gamma2 = gamma2, g1 = g1, g2 = g2)
x <- c(1, 4, 5, 3, 1) tau <- 0.5 beta <- 0.1 gamma1 <- 0.5 gamma2 <- 0.3 g1 <- 1 g2 <- 2 like_gknown_3( x = x, tau = tau, beta = beta, gamma1 = gamma1, gamma2 = gamma2, g1 = g1, g2 = g2)
This is under the two parameter model.
like_glpknown_2(gl, alpha, xi1, xi2, g1, g2, log_p = TRUE)
like_glpknown_2(gl, alpha, xi1, xi2, g1, g2, log_p = TRUE)
gl |
The matrix of genotype likelihoods of the offspring. Rows index The individuals, columns index the genotypes. |
alpha |
The double reduction rate. |
xi1 |
The preferential pairing parameter of parent 1. |
xi2 |
The preferential pairing parameter of parent 2. |
g1 |
Parent 1's genotype. |
g2 |
Parent 2's genotype. |
log_p |
A logical. Should we return the log likelihood or not? |
The (log) likelihood of the two parameter model when using genotype likelihoods.
David Gerard
g1 <- 1 g2 <- 0 gl <- simf1gl( n = 25, g1 = g1, g2 = g2, rd = 10, alpha = 0, xi1 = 1/3, xi2 = 1/3) like_glpknown_2( gl = gl, alpha = 0.01, xi1 = 0.5, xi2 = 0.3, g1 = g1, g2 = g2, log_p = TRUE)
g1 <- 1 g2 <- 0 gl <- simf1gl( n = 25, g1 = g1, g2 = g2, rd = 10, alpha = 0, xi1 = 1/3, xi2 = 1/3) like_glpknown_2( gl = gl, alpha = 0.01, xi1 = 0.5, xi2 = 0.3, g1 = g1, g2 = g2, log_p = TRUE)
This is under the three parameter model.
like_glpknown_3(gl, tau, beta, gamma1, gamma2, g1, g2, log_p = TRUE)
like_glpknown_3(gl, tau, beta, gamma1, gamma2, g1, g2, log_p = TRUE)
gl |
The matrix of genotype likelihoods of the offspring. Rows index The individuals, columns index the genotypes. |
tau |
The probability of quadrivalent formation. |
beta |
The probability of double reduction given quadrivalent formation. |
gamma1 |
The probability of AA_aa pairing for parent 1. |
gamma2 |
The probability of AA_aa pairing for parent 2. |
g1 |
Parent 1's genotype. |
g2 |
Parent 2's genotype. |
log_p |
A logical. Should we return the log likelihood or not? |
The (log) likelihood of the three parameter model when using genotype likelihoods.
David Gerard
g1 <- 1 g2 <- 0 gl <- simf1gl( n = 25, g1 = g1, g2 = g2, rd = 10, alpha = 0, xi1 = 1/3, xi2 = 1/3) like_glpknown_3( gl = gl, tau = 1/2, beta = 1/12, gamma1 = 1/3, gamma2 = 1/3, g1 = g1, g2 = g2, log_p = TRUE)
g1 <- 1 g2 <- 0 gl <- simf1gl( n = 25, g1 = g1, g2 = g2, rd = 10, alpha = 0, xi1 = 1/3, xi2 = 1/3) like_glpknown_3( gl = gl, tau = 1/2, beta = 1/12, gamma1 = 1/3, gamma2 = 1/3, g1 = g1, g2 = g2, log_p = TRUE)
em_li()
Objective function for em_li()
llike_li(B, lpivec)
llike_li(B, lpivec)
B |
The log-likelihood matrix. Rows are individuals columns are genotypes. |
lpivec |
The log prior vector. |
The log-likelihood of a vector of genotype frequencies when using genotype likelihoods. This is from Li (2011).
David Gerard
Li, H. (2011). A statistical framework for SNP calling, mutation discovery, association mapping and population genetical parameter estimation from sequencing data. Bioinformatics, 27(21), 2987-2993. doi:10.1093/bioinformatics/btr509
# Simulate some data set.seed(1) gl <- simgl(nvec = c(3, 2, 4, 1, 2)) # Log-likelihood at given log-priors prob <- c(0.1, 0.2, 0.4, 0.2, 0.1) lprob <- log(prob) llike_li(B = gl, lpivec = lprob)
# Simulate some data set.seed(1) gl <- simgl(nvec = c(3, 2, 4, 1, 2)) # Log-likelihood at given log-priors prob <- c(0.1, 0.2, 0.4, 0.2, 0.1) lprob <- log(prob) llike_li(B = gl, lpivec = lprob)
This will run a likelihood ratio test using the genotypes of an F1 population of tetraploids for the null of Mendelian segregation (accounting for double reduction and preferential pairing) against the alternative of segregation distortion. This is when the genotypes are assumed known.
lrt_men_g4( x, g1, g2, drbound = 1/6, pp = TRUE, dr = TRUE, alpha = 0, xi1 = 1/3, xi2 = 1/3 )
lrt_men_g4( x, g1, g2, drbound = 1/6, pp = TRUE, dr = TRUE, alpha = 0, xi1 = 1/3, xi2 = 1/3 )
x |
A vector of genotype counts. |
g1 |
The genotype of parent 1. |
g2 |
The genotype of parent 2. |
drbound |
The maximum rate of double reduction. A default of 1/6 is provided, which is the rate under the complete equational segregation model of meiosis. |
pp |
A logical. Should we account for preferential pairing
( |
dr |
A logical. Should we account for double reduction
( |
alpha |
If |
xi1 |
If |
xi2 |
If |
A list with the following elements
statistic
The log-likelihood ratio test statistic.
df
The degrees of freedom.
p_value
The p-value.
alpha
The estimated double reduction rate.
xi1
The estimated preferential pairing parameter of parent 1.
xi2
The estimated preferential pairing parameter of parent 2.
Some offspring genotype combinations are impossible given the parental genotypes. If these impossible genotypes combinations show up, we return a p-value of 0, a log-likelihood ratio statistic of Infinity, and missing values for all other return items. The impossible genotypes are:
g1 = 0 && g2 = 0
Only offspring genotypes of 0 are possible.
g1 = 4 && g2 = 4
Only offspring genotypes of 4 are possible.
g1 = 0 && g2 = 4 || g1 == 4 && g2 == 0
Only offspring genotypes of 2 are possible.
g1 = 0 && g2 %in% c(1, 2, 3) || g1 = %in% c(1, 2, 3) && g2 == 0
Only offspring genotypes of 0, 1, and 2 are possible.
g1 = 4 && g2 %in% c(1, 2, 3) || g1 = %in% c(1, 2, 3) && g2 == 4
Only offspring genotypes of 2, 3, and 4 are possible.
When g1 = 2
or g2 = 2
(or both), the model is not identified
and those estimates (alpha
, xi1
, and xi2
) are
meaningless. Do NOT interpret them.
The estimate of alpha
(double reduction rate) IS identified as
long as at least one parent is simplex, and no parent is duplex.
However, the estimates of the double reduction rate have extremely high
variance.
David Gerard
set.seed(100) gf <- offspring_gf_2(alpha = 1/12, xi1 = 0.2, xi2 = 0.6, p1 = 1, p2 = 0) x <- offspring_geno(gf = gf, n = 100) lrt_men_g4(x = x, g1 = 1, g2 = 0)
set.seed(100) gf <- offspring_gf_2(alpha = 1/12, xi1 = 0.2, xi2 = 0.6, p1 = 1, p2 = 0) x <- offspring_geno(gf = gf, n = 100) lrt_men_g4(x = x, g1 = 1, g2 = 0)
This will run a likelihood ratio test using the genotypes of an F1 population of tetraploids for the null of Mendelian segregation (accounting for double reduction and preferential pairing) against the alternative of segregation distortion. This is when genotype uncertainty is accounted for through genotype likelihoods.
lrt_men_gl4( gl, g1 = NULL, g2 = NULL, drbound = 1/6, pp = TRUE, dr = TRUE, alpha = 0, xi1 = 1/3, xi2 = 1/3 )
lrt_men_gl4( gl, g1 = NULL, g2 = NULL, drbound = 1/6, pp = TRUE, dr = TRUE, alpha = 0, xi1 = 1/3, xi2 = 1/3 )
gl |
The genotype log-likelihoods. The rows index the individuals and the columns index the genotypes. |
g1 |
Either parent 1's genotype, or parent 1's genotype log-likelihoods. |
g2 |
Either parent 2's genotype, or parent 2's genotype log-likelihoods. |
drbound |
The upper bound on the double reduction rate. |
pp |
Is (partial) preferential pairing possible ( |
dr |
Is double reduction possible ( |
alpha |
If |
xi1 |
If |
xi2 |
If |
A list with the following elements
statistic
The log-likelihood ratio test statistic.
df
The degrees of freedom.
p_value
The p-value.
alpha
The estimated double reduction rate.
xi1
The estimated preferential pairing parameter of parent 1.
xi2
The estimated preferential pairing parameter of parent 2.
When g1 = 2
or g2 = 2
(or both), the model is not identified
and those estimates (alpha
, xi1
, and xi2
) are
meaningless. Do NOT interpret them.
The estimate of alpha
(double reduction rate) IS identified as
long as at least one parent is simplex, and no parent is duplex.
However, the estimates of the double reduction rate have extremely high
variance.
David Gerard
## null simulation set.seed(1) g1 <- 2 g2 <- 2 gl <- simf1gl(n = 25, g1 = g1, g2 = g2, alpha = 1/12, xi2 = 1/2) ## LRT when parent genotypes are known. lrt_men_gl4(gl = gl, g1 = g1, g2 = g2) ## LRT when parent genotypes are not known lrt_men_gl4(gl = gl) ## Alternative simulation gl <- simgl(nvec = rep(5, 5)) lrt_men_gl4(gl = gl, g1 = g1, g2 = g2)
## null simulation set.seed(1) g1 <- 2 g2 <- 2 gl <- simf1gl(n = 25, g1 = g1, g2 = g2, alpha = 1/12, xi2 = 1/2) ## LRT when parent genotypes are known. lrt_men_gl4(gl = gl, g1 = g1, g2 = g2) ## LRT when parent genotypes are not known lrt_men_gl4(gl = gl) ## Alternative simulation gl <- simgl(nvec = rep(5, 5)) lrt_men_gl4(gl = gl, g1 = g1, g2 = g2)
Uses the future
package to implement parallelization support for
the likelihood ratio tests for segregation distortion. Right now, this is
only supported for tetraploids (allo, auto, or segmental). This function
is only somewhat tested (the single-locus LRT functions in the "See Also"
section are very well tested). So please send any bugs you notice to
https://github.com/dcgerard/segtest/issues.
multi_lrt( g, p1, p2, drbound = 1/6, pp = TRUE, dr = TRUE, alpha = 0, xi1 = 1/3, xi2 = 1/3, nullprop = FALSE )
multi_lrt( g, p1, p2, drbound = 1/6, pp = TRUE, dr = TRUE, alpha = 0, xi1 = 1/3, xi2 = 1/3, nullprop = FALSE )
g |
One of two inputs
|
p1 |
One of three inputs
|
p2 |
One of three inputs
|
drbound |
The upper bound on the double reduction rate. |
pp |
Is (partial) preferential pairing possible ( |
dr |
Is double reduction possible ( |
alpha |
If |
xi1 |
If |
xi2 |
If |
nullprop |
Should we return the null proportions ( |
A data frame with the following elements:
statistic
The likelihood ratio test statistic
p_value
The p-value of the likelihood ratio test.
df
The degrees of freedom of the test.
alpha
The MLE of the double reduction rate. Do not use for real work.
xi1
The MLE of the first parent's partial preferential pairing parameter. Do not use for real work.
xi2
The MLE of the second parent's partial preferential pairing parameter. Do not use for real work.
p1
(Estimate of) the first parent's genotype.
p2
(Estimate of) the second parent's genotype.
snp
The name of the SNP.
The multi_lrt()
function supports parallel computing. It does
so through the future
package.
You first specify the evaluation plan with plan()
from the future
package. On a local machine, this is typically
just future::plan(future::multisession, workers = nc)
where
nc
is the number of workers you want. You can find the maximum
number of possible workers with availableCores()
.
You then run multi_lrt()
, then shut down the workers with
future::plan(future::sequential)
.
David Gerard
lrt_men_g4()
Single locus LRT for segregation distortion when genotypes are known.
lrt_men_gl4()
Single locus LRT for segregation distortion when using genotype likelihoods.
## Assuming genotypes are known (typically a bad idea) glist <- multidog_to_g(mout = ufit, type = "all_g", p1 = "indigocrisp", p2 = "sweetcrisp") p1_1 <- glist$p1 p2_1 <- glist$p2 g_1 <- glist$g multi_lrt(g = g_1, p1 = p1_1, p2 = p2_1) ## Using genotype likelihoods (typically a good idea) glist <- multidog_to_g(mout = ufit, type = "all_gl", p1 = "indigocrisp", p2 = "sweetcrisp") p1_2 <- glist$p1 p2_2 <- glist$p2 g_2 <- glist$g multi_lrt(g = g_2, p1 = p1_2, p2 = p2_2) ## Offspring genotype likelihoods and parent genotypes known multi_lrt(g = g_2, p1 = p1_1, p2 = p2_1) ## Offspring genotype likelihoods and no information on parent genotypes multi_lrt(g = g_2, p1 = NULL, p2 = NULL) ## Parallel computing is supported through the future package future::plan(future::multisession, workers = 2) multi_lrt(g = g_2, p1 = p1_2, p2 = p2_2) future::plan(future::sequential)
## Assuming genotypes are known (typically a bad idea) glist <- multidog_to_g(mout = ufit, type = "all_g", p1 = "indigocrisp", p2 = "sweetcrisp") p1_1 <- glist$p1 p2_1 <- glist$p2 g_1 <- glist$g multi_lrt(g = g_1, p1 = p1_1, p2 = p2_1) ## Using genotype likelihoods (typically a good idea) glist <- multidog_to_g(mout = ufit, type = "all_gl", p1 = "indigocrisp", p2 = "sweetcrisp") p1_2 <- glist$p1 p2_2 <- glist$p2 g_2 <- glist$g multi_lrt(g = g_2, p1 = p1_2, p2 = p2_2) ## Offspring genotype likelihoods and parent genotypes known multi_lrt(g = g_2, p1 = p1_1, p2 = p2_1) ## Offspring genotype likelihoods and no information on parent genotypes multi_lrt(g = g_2, p1 = NULL, p2 = NULL) ## Parallel computing is supported through the future package future::plan(future::multisession, workers = 2) multi_lrt(g = g_2, p1 = p1_2, p2 = p2_2) future::plan(future::sequential)
Converts multidog output to a format usable for multi_lrt()
multidog_to_g( mout, type = c("off_gl", "all_gl", "all_g", "off_g"), p1 = NULL, p2 = NULL, ploidy = 4 )
multidog_to_g( mout, type = c("off_gl", "all_gl", "all_g", "off_g"), p1 = NULL, p2 = NULL, ploidy = 4 )
mout |
The output of |
type |
. |
p1 |
The first parent name if using |
p2 |
The second parent name if using |
ploidy |
The ploidy. Note that most methods in this package
(including those in |
A list with the following elements
g
Either a matrix of counts, where the columns index the genotype
and the rows index the loci (type = "all_g"
or
type = "off_g"
). Or an array of genotype (natural) log-likelihoods
where the rows index the loci, the columns index the
individuals, and the slices index the genotypes
(type = "all_gl"
or type = "off_gl"
).
p1
Either a vector of known parental genotypes
(type = "off_gl"
, type = "all_g"
or type = "off_g"
).
Or a matrix of genotype (natural) log-likelihoods where the
rows index the loci and the columns index the genotypes
(type = "all_gl"
).
p2
Either a vector of known parental genotypes
(type = "off_gl"
, type = "all_g"
or type = "off_g"
).
Or a matrix of genotype (natural) log-likelihoods where the
rows index the loci and the columns index the genotypes
(type = "all_gl"
).
David Gerard
multidog_to_g(mout = ufit, type = "all_g", p1 = "indigocrisp", p2 = "sweetcrisp") multidog_to_g(mout = ufit, type = "all_gl", p1 = "indigocrisp", p2 = "sweetcrisp") multidog_to_g(mout = ufit2, type = "off_g") multidog_to_g(mout = ufit2, type = "off_gl") multidog_to_g(mout = ufit3, type = "off_g") multidog_to_g(mout = ufit3, type = "off_gl")
multidog_to_g(mout = ufit, type = "all_g", p1 = "indigocrisp", p2 = "sweetcrisp") multidog_to_g(mout = ufit, type = "all_gl", p1 = "indigocrisp", p2 = "sweetcrisp") multidog_to_g(mout = ufit2, type = "off_g") multidog_to_g(mout = ufit2, type = "off_gl") multidog_to_g(mout = ufit3, type = "off_g") multidog_to_g(mout = ufit3, type = "off_gl")
This is applied to an arrayiter
object to obtain the next sub-array
along one of the dimensions.
## S3 method for class 'arrayiter' nextElem(obj, ...)
## S3 method for class 'arrayiter' nextElem(obj, ...)
obj |
An arrayiter object |
... |
not used |
The next sub-array.
David Gerard
glist <- multidog_to_g(mout = ufit, type = "all_gl", p1 = "indigocrisp", p2 = "sweetcrisp") g <- iterators::iter(glist$g, by = 3) head(iterators::nextElem(g)) head(iterators::nextElem(g)) head(iterators::nextElem(g))
glist <- multidog_to_g(mout = ufit, type = "all_gl", p1 = "indigocrisp", p2 = "sweetcrisp") g <- iterators::iter(glist$g, by = 3) head(iterators::nextElem(g)) head(iterators::nextElem(g)) head(iterators::nextElem(g))
Takes as input the offspring genotype frequencies and a sample size and returns simulated genotypes.
offspring_geno(gf, n)
offspring_geno(gf, n)
gf |
Vector of offspring genotype frequencies |
n |
Sample size |
Simulated genotypes
Mira Thakkar
set.seed(1) gf <- offspring_gf_2(alpha = 1/6, xi1 = 1/3, xi2 = 1/3, p1 = 2, p2 = 3) offspring_geno(gf = gf, n = 10)
set.seed(1) gf <- offspring_gf_2(alpha = 1/6, xi1 = 1/3, xi2 = 1/3, p1 = 2, p2 = 3) offspring_geno(gf = gf, n = 10)
Calculates offspring genotype frequencies under the two-parameter model.
offspring_gf_2(alpha, xi1, xi2 = xi1, p1, p2)
offspring_gf_2(alpha, xi1, xi2 = xi1, p1, p2)
alpha |
The double reduction rate |
xi1 |
The preferential pairing parameter of the first parent. |
xi2 |
The preferential pairing parameter of the second parent. |
p1 |
The first parent's genotype |
p2 |
The second parent's genotype |
Offspring genotype frequencies
Mira Thakkar
alpha <- 1/6 xi1 <- 1/3 xi2 <- 1/3 p1 <- 2 p2 <- 3 offspring_gf_2(alpha = alpha, xi1 = xi1, xi2 = xi2, p1 = p1, p2 = p2)
alpha <- 1/6 xi1 <- 1/3 xi2 <- 1/3 p1 <- 2 p2 <- 3 offspring_gf_2(alpha = alpha, xi1 = xi1, xi2 = xi2, p1 = p1, p2 = p2)
Calculates offspring genotype frequencies under the three-parameter model.
offspring_gf_3(tau, beta, gamma1, gamma2 = gamma1, p1, p2)
offspring_gf_3(tau, beta, gamma1, gamma2 = gamma1, p1, p2)
tau |
Probability of quadrivalent formation |
beta |
Probability of double reduction given quadrivalent formation |
gamma1 |
Probability of AA_aa pairing in parent 1 |
gamma2 |
Probability of AA_aa pairing in parent 2 |
p1 |
The first parent's genotype |
p2 |
The second parent's genotype |
Offspring genotype frequencies
David Gerard
offspring_gf_3( tau = 1/2, beta = 1/6, gamma1 = 1/3, gamma2 = 1/3, p1 = 1, p2 = 2)
offspring_gf_3( tau = 1/2, beta = 1/6, gamma1 = 1/3, gamma2 = 1/3, p1 = 1, p2 = 2)
This is experimental. I haven't tested it out in lots of scenarios yet.
otest_g( x, g1, g2, pbad = 0.03, drbound = 1/6, pp = TRUE, dr = TRUE, alpha = 0, xi1 = 1/3, xi2 = 1/3 )
otest_g( x, g1, g2, pbad = 0.03, drbound = 1/6, pp = TRUE, dr = TRUE, alpha = 0, xi1 = 1/3, xi2 = 1/3 )
x |
A vector of genotype counts. |
g1 |
The genotype of parent 1. |
g2 |
The genotype of parent 2. |
pbad |
The upper bound on the number of bad genotypes |
drbound |
The maximum rate of double reduction. A default of 1/6 is provided, which is the rate under the complete equational segregation model of meiosis. |
pp |
A logical. Should we account for preferential pairing
( |
dr |
A logical. Should we account for double reduction
( |
alpha |
If |
xi1 |
If |
xi2 |
If |
Here, we test if the compatible genotypes are consistent with F1 populations and separately test that the number of incompatible genotypes isn't too large (less than 3 percent by default). This is the strategy the polymapR software uses. But we use a Bonferroni correction to combine these tests (minimum of two times the p-values), while they just multiply the p-values together. So our approach accounts for double reduction and preferential pairing, while also controlling the family-wise error rate.
A list with the following elements
statistic
The log-likelihood ratio test statistic.
df
The degrees of freedom.
p_value
The Bonferroni corrected p-value.
p_lrt
The p-value of the LRT.
p_binom
The p-value of the one-sided binomial test.
alpha
The estimated double reduction rate.
xi1
The estimated preferential pairing parameter of parent 1.
xi2
The estimated preferential pairing parameter of parent 2.
David Gerard
# Run a test where genotypes 0, 1, and 2 are possible x <- c(10, 10, 4, 0, 5) otest_g(x = x, g1 = 1, g2 = 0) # polymapR's multiplication and the Bonferroni differ df <- expand.grid(p1 = seq(0, 1, length.out = 20), p2 = seq(0, 1, length.out = 20)) df$polymapr <- NA df$bonferroni <- NA for (i in seq_len(nrow(df))) { df$polymapr[[i]] <- df$p1[[i]] * df$p2[[i]] df$bonferroni[[i]] <- 2 * min(c(df$p1[[i]], df$p2[[i]], 0.5)) } graphics::plot(df$polymapr, df$bonferroni)
# Run a test where genotypes 0, 1, and 2 are possible x <- c(10, 10, 4, 0, 5) otest_g(x = x, g1 = 1, g2 = 0) # polymapR's multiplication and the Bonferroni differ df <- expand.grid(p1 = seq(0, 1, length.out = 20), p2 = seq(0, 1, length.out = 20)) df$polymapr <- NA df$bonferroni <- NA for (i in seq_len(nrow(df))) { df$polymapr[[i]] <- df$p1[[i]] * df$p2[[i]] df$bonferroni[[i]] <- 2 * min(c(df$p1[[i]], df$p2[[i]], 0.5)) } graphics::plot(df$polymapr, df$bonferroni)
Takes as input (i) the parent genotypes, (ii) the offspring genotype frequency, (iii) sequencing error rate, (iv) read depth, (v) bias, and (vi) overdispersion. It returns genotype likelihoods.
po_gl( genovec, ploidy, p1_geno = NULL, p2_geno = NULL, rd = 10, seq = 0.01, bias = 1, od = 0.01 )
po_gl( genovec, ploidy, p1_geno = NULL, p2_geno = NULL, rd = 10, seq = 0.01, bias = 1, od = 0.01 )
genovec |
Offspring genotypes. |
ploidy |
Ploidy |
p1_geno |
Parent 1 genotype |
p2_geno |
Parent 2 genotype |
rd |
Read depth. Lower is more uncertain. |
seq |
Sequencing error rate. Higher means more uncertain. |
bias |
Bias. 1 means no bias. |
od |
Overdispersion. Typical value is like 0.01. Higher means more uncertain. |
Genotype likelihoods
Mira Thakkar
set.seed(1) po_gl(genovec = c(1, 2, 1, 1, 3), p1_geno = 2, p2_geno = 2, ploidy = 4)
set.seed(1) po_gl(genovec = c(1, 2, 1, 1, 3), p1_geno = 2, p2_geno = 2, ploidy = 4)
The polymapR package tests for segregation distortion by iterating through all
possible forms of disomic or polysomic inheritance from either parent,
tests for concordance of the offspring genotypes using a chi-squared
test, and returns the largest p-value. It sometimes chooses a different
p-value based on other heuristics. They also sometimes return NA.
When type = "segtest"
, we only look at patterns of the
given parent genotypes, choosing the largest p-value. When
type = "polymapR"
, we return what they use via their heuristics.
polymapr_test(x, g1 = NULL, g2 = NULL, type = c("segtest", "polymapR"))
polymapr_test(x, g1 = NULL, g2 = NULL, type = c("segtest", "polymapR"))
x |
Either a vector of genotype counts, or a matrix of genotype posteriors where the rows index the individuals and the columns index the genotypes. |
g1 |
Parent 1's genotype. |
g2 |
Parent 2's genotype. |
type |
Either my implementation which approximates that of
polymapR ( |
A list with the following elements:
The p-value of the test.
The best fit model, using the same notation as in
checkF1()
.
The frequency of invalid genotypes.
David Gerard
checkF1()
.
g1 <- 0 g2 <- 1 x <- c(4, 16, 0, 0, 0) polymapr_test(x = x, g1 = g1, g2 = g2, type = "segtest")
g1 <- 0 g2 <- 1 x <- c(4, 16, 0, 0, 0) polymapr_test(x = x, g1 = g1, g2 = g2, type = "segtest")
This is under the two parameter model.
pvec_tet_2(alpha, xi, ell)
pvec_tet_2(alpha, xi, ell)
alpha |
The double reduction rate |
xi |
The preferential pairing parameter |
ell |
The parental genotype |
The gamete genotype frequencies
Mira Thakkar and David Gerard
alpha <- 1/6 xi <- 1/3 pvec_tet_2(alpha = alpha, xi = xi, ell = 0) pvec_tet_2(alpha = alpha, xi = xi, ell = 1) pvec_tet_2(alpha = alpha, xi = xi, ell = 2) pvec_tet_2(alpha = alpha, xi = xi, ell = 3) pvec_tet_2(alpha = alpha, xi = xi, ell = 4)
alpha <- 1/6 xi <- 1/3 pvec_tet_2(alpha = alpha, xi = xi, ell = 0) pvec_tet_2(alpha = alpha, xi = xi, ell = 1) pvec_tet_2(alpha = alpha, xi = xi, ell = 2) pvec_tet_2(alpha = alpha, xi = xi, ell = 3) pvec_tet_2(alpha = alpha, xi = xi, ell = 4)
This is under the three parameter model.
pvec_tet_3(tau, beta, gamma, ell)
pvec_tet_3(tau, beta, gamma, ell)
tau |
Probability of quadrivalent formation |
beta |
Probability of double reduction given quadrivalent formation |
gamma |
Probability of AA/aa pairing given bivalent formation |
ell |
The parent genotype |
The gamete genotype frequencies
David Gerard
pvec_tet_3(tau = 0.5, beta = 0.1, gamma = 0.5, ell = 2)
pvec_tet_3(tau = 0.5, beta = 0.1, gamma = 0.5, ell = 2)
Simulate genotype counts from F1 individuals
simf1g(n, g1, g2, alpha = 0, xi1 = 1/3, xi2 = 1/3)
simf1g(n, g1, g2, alpha = 0, xi1 = 1/3, xi2 = 1/3)
n |
Sample size. |
g1 |
The first parent's genotype. |
g2 |
The second parent's genotype. |
alpha |
The double reduction rate. |
xi1 |
The first parent's preferential pairing parameter. |
xi2 |
The second parent's preferential pairing parameter. |
A vector of counts, where element i
is the number of
simulated individuals with genotype i-1
.
David Gerard
set.seed(1) simf1g(n = 10, g1 = 1, g2 = 2)
set.seed(1) simf1g(n = 10, g1 = 1, g2 = 2)
Simulate genotype likelihoods of F1 individuals.
simf1gl(n, g1, g2, rd = 10, alpha = 0, xi1 = 1/3, xi2 = 1/3)
simf1gl(n, g1, g2, rd = 10, alpha = 0, xi1 = 1/3, xi2 = 1/3)
n |
Sample size. |
g1 |
The first parent's genotype. |
g2 |
The second parent's genotype. |
rd |
The read depth. |
alpha |
The double reduction rate. |
xi1 |
The first parent's preferential pairing parameter. |
xi2 |
The second parent's preferential pairing parameter. |
The matrix of offspring genotype log-likelihoods.
David Gerard
set.seed(1) simf1gl(n = 10, g1 = 1, g2 = 2)
set.seed(1) simf1gl(n = 10, g1 = 1, g2 = 2)
Provide a vector of genotype counts and this will return a matrix of genotype log-likelihoods.
simgl(nvec, rd = 10, seq = 0.01, bias = 1, od = 0.01)
simgl(nvec, rd = 10, seq = 0.01, bias = 1, od = 0.01)
nvec |
A vector of counts. |
rd |
Read depth. Lower is more uncertain. |
seq |
Sequencing error rate. Higher means more uncertain. |
bias |
Bias. 1 means no bias. |
od |
Overdispersion. Typical value is like 0.01. Higher means more uncertain. |
A matrix of genotype log-likelihoods. The rows index the individuals and the columns index the genotypes. This is natural log (base e).
David Gerard
set.seed(1) simgl(nvec = c(1, 2, 1, 1, 3))
set.seed(1) simgl(nvec = c(1, 2, 1, 1, 3))
Convert from three parameters to two parameters
three_to_two(tau, beta, gamma)
three_to_two(tau, beta, gamma)
tau |
Probability of quadrivalent formation |
beta |
Probability of double reduction given quadrivalent formation |
gamma |
Probability of AA/aa pairing given bivalent formation |
A vector of length two. The first is the double reduction rate
(alpha
), and the second is the preferential pairing parameter
(xi
).
David Gerard
three_to_two(tau = 0.1, beta = 1/6, gamma = 1/4)
three_to_two(tau = 0.1, beta = 1/6, gamma = 1/4)
A subset of data from Cappai et al. (2020), fit using
multidog()
. This just contains a random set of 10 loci.
ufit ufit2 ufit3
ufit ufit2 ufit3
An object of type multidog
output from multidog()
.
ufit
Uses the model = "norm"
option.
ufit2
Uses the model = "f1pp"
option.
ufit3
Uses the model = "f1"
option.
An object of class multidog
of length 2.
An object of class multidog
of length 2.
Cappai, F., Amadeu, R. R., Benevenuto, J., Cullen, R., Garcia, A., Grossman, A., Ferrão, L., & Munoz, P. (2020). High-resolution linkage map and QTL analyses of fruit firmness in autotetraploid blueberry. Frontiers in plant science, 11, 562171. doi:10.3389/fpls.2020.562171.