Date of Award


Publication Type

Doctoral Thesis

Degree Name



Mathematics and Statistics

First Advisor

Sudhir Paul


Pure sciences, Count data, Ratio estimators, Regression estimators, Variance function




Clustered binary data arise in many fields such as epidemiology, toxicology, econometrics and pharmacokinetics modelling. For instance, in many epidemiological studies the purpose of the investigation is to compare the risk experienced between two groups where each group has clustered observations. Several methods have been developed in the literature for interval estimation of epidemiological indices such as the risk difference, the risk ratio and the relative difference. In this dissertation we introduce two very simple methods. One of these is based on an estimator of the variance of a ratio estimator and the other is based on a sandwich estimator of the variance of the regression estimator using the generalized estimating equations (GEE) approach. We then compare these two methods, by simulation, in terms of maintaining nominal coverage probability and average coverage length, with the four methods discussed earlier in the literature. It is shown that the methods based on an estimator of the variance of ratio estimate performs better in terms of coverage probability, symmetry and bias. The proposed methods are then applied to analyze toxicological and educational intervention program datasets.

The phenomenon of overdispersion is also quite common in count data. Overdispersion is suspected when the variance is larger than the mean. In semi-parametric analysis of overdispersed count data, one often needs to determine an appropriate variance function (mean variance relationship). For example, in chemical and biological assay problems, to control the quality of techniques, one has to adjust the levels of experimental factors to bring the mean response to a target value while minimizing variance. The emphasis is on problems involving simultaneous consideration of both mean and variance where the latter may be a function of the former. In this dissertation, by using a hypothesis testing approach through a broader class of models and a data analytic approach, we propose an appropriate mean-variance relationship which can be used in the semi-parametric analysis of count data.