Date of Award


Publication Type

Doctoral Thesis

Degree Name



Mathematics and Statistics


Sudhir Paul




An important issue in regression analysis of longitudinal data is model parsimony, that is, finding a model with as few regression variables as possible while retaining good properties of the parameter estimates. In this vein, joint modelling of mean and variance taking into account the intra subject correlation has been standard in recent literature (Pourahmadi, 1999, 2000; Ye and Pan, 2006; and Leng, Zhang, and Pan, 2010). Zhang, Leng, and Tang (2015) propose joint parametric modelling of the means, variances and correlations by decomposing the correlation matrix via hyperspherical co-ordinates and show that this results in unconstrained parameterization, fast computation, easy interpretation of the parameters, and model parsimony. We investigate the properties of the estimates of the regression parameters through semiparametric modelling of the means and variances and study the impact of this to model parsimony. An extensive simulation study is conducted. Three datasets, namely, a biomedical dataset, an environmental dataset and a cattle dataset are analysed. In longitudinal studies, researchers frequently encounter covariates that are varying over time (see for example Huang, Wu, and Zhou, 2002). We consider a generalized partially linear varying coefficient model for such data and propose a regression spline based approach to estimate the mean and covariance parameters jointly where the correlation matrix is decomposed via hyperspherical co-ordinates. A simulation study is conducted to investigate the properties of the estimates of the regression parameters in terms of bias and standard error and to analyse a real data set taken from a multi-center AIDS cohort study. The problem of model selection in regression analysis through the use of forward selection, backward elimination and stepwise selection has been well developed in the literature. The main assumption in this, of course, is that the data are normally distributed and the main tool used here is either a t test or an F test. However, properties of these model selection procedures in the framework of generalized linear models are not well-known. We study here the properties of these procedures in generalized linear models, of which the normal linear regression model is a special case. The main tools that is being used are the score test, the F-test, other large sample tests, such as, the likelihood ratio test and the Wald test; the AIC and the BIC are included in the comparison. A systematic study, through simulations, of the properties of this procedure is conducted, in terms of level and power, for normal, Poisson and binomial regression models. Extensions for over-dispersed Poisson and over-dispersed binomial regression models are also given and evaluated. The methods are applied to analyse three data sets. In practice, it often occurs that an abundance of zero counts arise in data where a discrete generalized linear model may fail to fit but a zero-inflated generalized linear model can be the ideal choice. Researchers often encounter a large number of covariates in such model and need to decide which are potentially important. To find a parsimonious model we develop a model selection procedure using the score test, the Wald test and the likelihood ratio test; also the AIC and the BIC are included in the comparison. Simulation studies are carried out to investigate the performance of these procedures, in terms of level and power, for zero-inflated Poisson and zero-inflated binomial regression models. The methodology is illustrated through two real examples.