#### Date of Award

2011

#### Degree Type

Dissertation

#### Degree Name

Ph.D.

#### Department

Mathematics and Statistics

#### First Advisor

Paul, Sudhir (Mathematics and Statistics)

#### Keywords

Statistics.

#### Rights

CC BY-NC-ND 4.0

#### Abstract

Clustered (includes longitudinal) count data arise in many bio-statistical practices in which a number of repeated responses are observed over time from a number of individuals. One important problem that arises in practice is to test homogeneity within clusters (individuals) and between clusters (individuals). As data within clusters are observations of repeated responses, the count data may be correlated and/or over-dispersed. Jacqmin-Gadda and Commenges (1995) derive a score test statistic H_S by assuming a random intercept model within the framework of the generalized linear mixed model by obtaining exact variance of the likelihood score under the null hypothesis of homogeneity and a score test statistic H_T using the generalized estimating equation (GEE) approach (Liang and Zeger, 1986; Zeger and Liang, 1986). They further show that the two tests are identical when the covariance matrix assumed in the GEE approach is that of the random-effects model. In each of these cases they deal with (a) the situation in which the dispersion parameter $\phi$ is assumed to be known and (b) the situation in which the dispersion parameter $\phi$ is assumed to be unknown. The second situation, however, is more realistic as $\phi$ will be unknown in practice. For over-dispersed count data with unknown over-dispersion parameter we use the score test procedure of Rao (1947) and derive three tests by assuming a random intercept model within the framework of (i) the over-dispersed generalized linear model (ii) the negative binomial model, and (iii) the double extended quasi likelihood model (Lee and Nelder, 2001). All these three statistics are much simpler than the statistic obtained from the statistic $H_S$ derived by Jacqmin-Gadda and Commenges (1995) under the framework of the over-dispersed generalized linear mixed effects model. The second statistic takes the over-dispersion more directly into the model and therefore is expected to do well when the model assumptions are satisfied and the other statistics are expected to be robust. Simulations show superior level property of the statistics derived under the negative binomial and double extended quasi-likelihood model assumptions. Further, two score tests have been developed to test for over-dispersion in the generalized linear mixed model. The four score tests of homogeneity and the two score tests for detecting over-dispersion are applied to two real life data examples. A plan for future study is given.

#### Recommended Citation

Azad, Kazi, "Some Inference Problems in Clustered (Longitudinal) Count Data with Over-dispersion" (2011). *Electronic Theses and Dissertations*. 420.

http://scholar.uwindsor.ca/etd/420