Warning Message: in Eval(Family$initialize) : Non-integer #successes in a Binomial Glm!

Question

Like @HoongOoi said, glm.fit with binomial family expects integer counts and throws a warning otherwise; if you want non-integer counts, use quasi-binomial. The rest of my reply compares these.

Quasi-binomial in R for glm.fit is exactly the same equally binomial for the coefficient estimates (every bit mentioned in comments by @HongOoi) just non for the standard errors (every bit mentioned in the annotate past @nograpes).

Comparison of source lawmaking

A diff on the source code of stats::binomial and stats::quasibinomial shows the following changes:

the text "binomial" becomes "quasibinomial"
the aic function returns NA instead of the calculated AIC

and the following removals:

setting outcomes to 0 when weights = 0
check on integrality of weights
simfun function to simulate data

Only simfun could make a difference, but the source lawmaking of glm.fit shows no employ of that office, unlike other fields in the object returned by stats::binomial such as mu.eta and link.

Minimal working example

The results from using quasibinomial or binomial are the same for the coefficients in this minimal working example:

          library('MASS') library('stats')  gen_data <- office(n=100, p=iii) {    set.seed(1)     weights <- stats::rgamma(n=n, shape=rep(1, n), charge per unit=rep(ane, n))   y <- stats::rbinom(north=n, size=ane, prob=0.5)   theta <- stats::rnorm(northward=p, mean=0, sd=1)   ways <- colMeans(as.matrix(y) %*% theta)   ten <- MASS::mvrnorm(due north=n, means, diag(ane, p, p))    return(listing(10=x, y=y, weights=weights, theta=theta))   }  fit_glm <- function(family) {   data <- gen_data()   fit <- stats::glm.fit(ten = data$ten,                         y = data$y,                         weights = data$weights,                         family = family)   return(fit) }  fit1 <- fit_glm(family=stats::binomial(link = "logit")) fit2 <- fit_glm(family=stats::quasibinomial(link = "logit"))  all(fit1$coefficients == fit2$coefficients)

Comparison with the quasibinomial probability distribution

This thread suggests that the quasibinomial distribution is different from the binomial distribution with an additional parameter phi. But they mean different things in statistics and in R.

Starting time, no place in the source code of quasibinomial mentions that additional phi parameter.

Second, a quasiprobability is similar to a probability, simply not a proper ane. In this case, i cannot compute the term (northward \cull thou) when the numbers are not-integers, although one could with the Gamma function. This may exist a problem for the definition of the probability distribution but is irrelevant for interpretation, equally the term (n choose k) do not depend on the parameter and fall out of optimisation.

The log-likelihood reckoner is:

log-likelihood estimator

The log-likelihood as a office of theta with the binomial family is:

log-likelihood with binomial family

where the constant is independent of the parameter theta, so it falls out of optimisation.

Comparison of standard errors

The standard errors calculated by stats::summary.glm utilise a unlike dispersion value for the binomial and quasibinomial families, equally mentioned in stats::summary.glm:

The dispersion of a GLM is not used in the fitting procedure, but it is needed to find standard errors. If dispersion is not supplied or NULL, the dispersion is taken as 1 for the binomial and Poisson families, and otherwise estimated past the rest Chisquared statistic (calculated from cases with non-zero weights) divided by the residual degrees of freedom.

...

cov.unscaled: the unscaled (dispersion = 1) estimated covariance matrix of the estimated coefficients.

cov.scaled: ditto, scaled past dispersion.

Using the the to a higher place minimal working example:

          summary1 <- stats::summary.glm(fit1) summary2 <- stats::summary.glm(fit2)  print("Equality of unscaled variance-covariance-matrix:") all(summary1$cov.unscaled == summary2$cov.unscaled)  print("Equality of variance-covariance matrix scaled past `dispersion`:") all(summary1$cov.scaled == summary2$cov.scaled)  print(summary1$coefficients) print(summary2$coefficients)

shows the same coefficients, aforementioned unscaled variance-covariance matrix, and unlike scaled variance-covariance matrices:

          [i] "Equality of unscaled variance-covariance-matrix:" [1] True [1] "Equality of variance-covariance matrix scaled past `dispersion`:" [i] Imitation        Approximate Std. Mistake   z value   Pr(>|z|) [1,] -0.3726848  0.1959110 -ane.902317 0.05712978 [two,]  0.5887384  0.2721666  ii.163155 0.03052930 [3,]  0.3161643  0.2352180  one.344133 0.17890528        Estimate Std. Error   t value   Pr(>|t|) [1,] -0.3726848  0.1886017 -i.976042 0.05099072 [2,]  0.5887384  0.2620122  2.246988 0.02690735 [three,]  0.3161643  0.2264421  1.396226 0.16583365

Warning Message: in Eval(Family$initialize) : Non-integer #successes in a Binomial Glm!

Comparison of source lawmaking

Minimal working example

Comparison with the quasibinomial probability distribution

Comparison of standard errors

0 Response to "Warning Message: in Eval(Family$initialize) : Non-integer #successes in a Binomial Glm!"

Post a Comment

Iklan Atas Artikel

Iklan Tengah Artikel 1

Iklan Tengah Artikel 2

Iklan Bawah Artikel