## Negative Binomial Regression | Stata Data Analysis Examples

4 stars based on 38 reviews

Negative binomial regression is implemented using maximum likelihood estimation. The traditional model and the rate model with offset are demonstrated, along with regression diagnostics. Negative binomial regression is a type of generalized linear model in which the dependent variable is a count of the number of times an event occurs. A convenient parametrization of the negative binomial distribution is given by Hilbe [ 1 ]:. Hilbe [ 1 ] derives this parametrization as a Poisson-gamma mixture, or alternatively as the number of failures before the success, though we will not require to be an negative binomial model assumptions.

The traditional negative binomial regression model, designated the NB2 model in [ 1 ], is. Given a negative binomial model assumptions sample of subjects, we observe for subject the dependent variable and the predictor variables. Utilizing vector and matrix notation, we letand we gather the predictor data into the design matrix as follows:.

Designating the row of to beand exponentiating 2we can then write the distribution 1 as. We estimate and using maximum likelihood estimation. The likelihood negative binomial model assumptions is. The values of and that maximize will be the maximum likelihood estimates we seek, and the estimated variance-covariance matrix of the estimators negative binomial model assumptionswhere is the Hessian matrix of second derivatives of the log-likelihood function. Then the variance-covariance matrix negative binomial model assumptions be used to find the usual Wald confidence intervals and -values of the coefficient estimates.

We will use Mathematica to replicate some examples given by Hilbe [ 1 ], negative binomial model assumptions uses R and Stata. We start with simulated data generated with known regression coefficients, then recover the coefficients using maximum likelihood estimation. We will generate a sample of observations of a dependent random variable that has a negative binomial model assumptions binomial distribution with mean given by 2using, and. The design matrix will contain independent standard normal variates.

Now we define and maximize the log-likelihood function 3obtaining the estimates of and. Some experimentation with starting values for the search may be required, and the accuracy goal may need to be lowered; we could obtain good starting values for using Poisson regression via GeneralizedLinearModelFitwhile is usually between 0.

Next, we find negative binomial model assumptions standard errors of the estimates. The standard errors are the square roots of the diagonal elements of the variance-covariance matrixwhich as mentioned above is given bywhere is the Hessian matrix of second derivatives of the log-likelihood function.

First, define the Hessian for any function. Then we find the Hessian and at the values of our negative binomial model assumptions estimates. We can now print a table of the results: If the dependent variable counts the number of events during a specified time intervalthen the observed rate can be modeled by using the traditional negative binomial model above, with a slight adjustment.

We note that can also be thought of as area or subpopulation size, among other interpretations that lead to considering a rate. Sincewe negative binomial model assumptions the following adjustment to model 2 above:.

This last term,is called the offset. So in our log-likelihood function, instead of replacing withwe replace withresulting in the following:.

Then we proceed as before, maximizing the new log-likelihood function in order to estimate the parameters. The Titanic survival data, available from [ 2 ] and analyzed in [ 1 ] using R and Stata, is summarized in Table 1, with crew members deleted. Why did fewer first-class children survive than second class or third class? Was it because first-class children were at extra risk? No, it was because there were fewer first-class children on board the Titanic in the first place.

So we do not want to model the raw number of survivors; instead, we want to model the proportion of survivors, which is the survival rate. So in 4 negative binomial model assumptions need to be the number of cases. We set up the design matrix, with indicators 1 for adults and males, and using indicator variables for second class and third class, which means first class will be a reference.

But negative binomial model assumptions more useful for interpretation of the coefficients would be the Incidence Rate Ratio IRR for each variable, which is obtained by exponentiating each coefficient. For example, out of a sample of adults, we expect that the survival rate, from our model 4will be while for an identical number of children we expect their survival rate to be.

So by dividing the two rates, we obtain the ratio of rates IRR to be. Thus, our interpretation is that adults survived at roughly half the rate at which children survived, among those of the same sex and class.

The standard error of IRR is found by multiplying the estimated IRR by the standard error of the coefficient see [ 1 ]while a confidence interval for IRR is found by exponentiating the confidence interval for the coefficient. Thus we obtain the following. We do not need IRR for orso we drop them and then print the resulting table.

The confidence interval for the variable class2 contains 1. We will address this after computing some model assessment statistics and residuals. Various types of model fit statistics and residuals are readily computed. We use definitions given in negative binomial model assumptions 1 ]; alternate definitions exist and would require only minor changes. We already have the log-likelihood as a byproduct of the maximization process. The deviance is defined as. For our NB2 model, this simplifies towhere.

These model assessment statistics are most useful when compared to those of a competing model, which we pursue in the negative binomial model assumptions section after computing residuals. The raw residuals are of coursewhile the Pearson residuals areand the deviance residuals areas defined in 6.

These residuals can be standardized by dividing bywhere the are the leverages obtained from the diagonal of the hat matrixfor equal to the diagonal matrix, with as the element of the diagonal. Hilbe recommends plotting the Standardized Pearson residuals versuswith a poor model fit indicated by residuals that are outside the interval when the leverage is high. We have two Standardized Pearson residuals that are not within the rangeone of which has a high leverage.

We also recall that the variable class2 was not significant. Perhaps the model will be improved if we remove class2. All that is required is to remove class2 from the design matrixremove the corresponding starting value from the maximizing command, and run the model again.

We obtain the following assessment statistics and standardized residuals for the revised model with class2 removed. We set up design matrix and find the coefficients. Comparing to the full model, we see that negative binomial model assumptions assessment statistics have improved they are smaller, indicating a better fitand the Standardized Pearson residuals with high leverages are within the recommended boundaries. It appears that the model negative binomial model assumptions been improved by dropping class2.

The traditional negative binomial regression model NB2 was implemented by maximum likelihood estimation without much difficulty, thanks to the maximization command negative binomial model assumptions especially to the automatic computation of the standard errors via the Hessian.

Other negative binomial models, such as the zero-truncated, zero-inflated, hurdle, and censored models, could likewise be implemented by merely changing the likelihood function.

The author acknowledges suggestions and assistance by the editor and the referee that helped to improve this article. Traditional Model Negative binomial regression is a type of generalized linear model in which the dependent variable is a count of the number of times an event occurs. A convenient parametrization of the negative binomial distribution is given by Hilbe [ 1 ]: The traditional negative binomial regression model, designated the NB2 model in [ 1 ], is 2 where the predictor variables are given, and the population regression coefficients are to be estimated.

Utilizing vector and matrix notation, we letand we gather the predictor data into the design matrix as follows: Designating the row of to beand exponentiating 2we can then write the distribution 1 as We estimate and using maximum likelihood estimation.

The likelihood function is and the log-likelihood function is 3 The values of and that maximize will be the maximum likelihood estimates we seek, and the estimated variance-covariance matrix of the estimators isnegative binomial model assumptions is the Hessian matrix of second derivatives of the log-likelihood function. But we arbitrarily set all starting values to 1. Define two helper functions. Finally, these are our standard errors.

We see that in each case the confidence interval has captured the population parameter. Traditional Model for Rates, Using Offset If the dependent variable counts the number of events during a specified time intervalthen the observed rate can be modeled by using the traditional negative binomial model above, with a slight adjustment. Sincewe make the following adjustment to model 2 above: So in our log-likelihood function, instead of replacing withwe replace withresulting in the following: Traditional Model with Offset for the Titanic Data The Titanic survival data, available from [ 2 ] and analyzed in [ 1 ] using R and Stata, is summarized in Table 1, with crew members deleted.

Hilbe, Negative Binomial Regression2nd ed. Cambridge University Press, Negative binomial model assumptions 19, www.

## Binary molecular compound examples

### Opzionibinarie vertiefte gewinnem

A coauthor and I recently encountered a bit of uncertainty regarding an underlying assumption of the negative binomial regression NBREG and were wondering if anyone had any advice on how to proceed. Our question centers on whether the NBREG model is capable of handling interdependence between counts, and, if so, what kind of interdependence is it designed to capture? In examples overdispersion is often attributed to one of two causal mechanisms. A common example is the number of published papers an assistant professor produces in a year.

We cannot assume the rate of publication is constant because professors will vary in their productivity for a number of reasons that are specific to each individual. A similar example has to do with how well sports teams perform across a season. Some teams will score at a higher rate than others because of a variable we cannot observe. In these examples, there is an interdependence within individual professors and within individual teams.

In this case, the individual counts are not independent of one another because success in one period might encourage the subject to make another attempt. For example, a successful sales pitch on Wednesday for a door-to-door salesman may encourage him to try again on Thursday. Another example might be the number of violent episodes mentally ill patients undergo in a given year.

Under this causal mechanism the contagion effect or interdependence is across time. One paper, in fact, went to great lengths to demonstrate why and how current NBREG models need to be modified to be capable of handling non-independence. If NBREG models can handle non-independence, which kind of non-independence are they meant to handle? There are at least 12 distinct probabilistic processes that can give rise to a negative binomial distribution Boswell and Patil, In my field, statistical ecology, three of these are often applicable— 1 heterogeneity in the Poisson intensity parameter the negative binomial arises as a gamma mixing distribution for a heterogeneous Poisson distribution , 2 grid sampling from a clustered population the negative binomial arises as a generalized Poisson model with Poisson distributed clusters and log series counts in a cluster , and 3 the outcome probability changes depending on the process history the negative binomial arises as a limiting distribution of a Polya-Eggenberger urn model.

The causal mechanisms you mention could be interpreted as examples of the first and third of these processes. Separate from these theoretical considerations the use of a negative binomial model can also be motivated by the nature of the mean-variance relationship of the response. I discuss some of these issues in a course I teach. I'm flying blind here I don't have a copy with me , but the book 'Univariate discrete distributions' Johnson, Kotz and Kemp should be worth browsing for further information on the distribution.

It's one of those books which, in an ideal world, would be on every statistician's shelf. In the sense that the interdependence can be dealt with as a hidden variable, yes, it deals with it. But we can do much better. The 'hidden variable' could be anything leading to overdispersion. If you look at nonlinear mixed models, then you can include the time-variable as a random effect. The book by Joseph Hilbe titled, Negative Binomial Regression Cambridge University Press should answer some of the questions raised in this discussion.

I address this issue in my recently released book, Hilbe, Joseph M. Basically, the negative binomial can be used to model unidentified correlation in the data, regardless of the cause. When we can identify the reason for the extra correlation, then one can use a model appropriate for the data — which may be a negative binomial, or not.

Of course, there are a variety of negative binomial models, each which address certain types of data situations. Note also that like the Poisson, the negative binomial can be overdispersed as well. Typically in such situations one can use a random intercept, or coefficient, or a host of other adjustments. I noticed that one of the statisticians commenting on this query asserts that the negative binomial is a type of Poisson-gamma mixture.

The NB-2 traditoinal version and NB-1 constant dispersion can be derived in that manner, but the negative binomial need not be considered in that manner at all. But this is all discussed in the book. December 11, at December 12, at 5: December 13, at August 8, at 6: September 7, at 6: