- This event has passed.
Thesis Defence: Spline Gaussian Cluster-Weighted Models
October 31 at 10:00 am - 2:00 pm

Ling Xue, supervised by Dr. John Thompson, will defend their thesis titled “Spline Gaussian Cluster-Weighted Models” in partial fulfillment of the requirements for the degree of Master of Science in Mathematics.
An abstract for Ling Xue’s thesis is included below.
Defences are open to all members of the campus community as well as the general public. Registration is not required for in-person defences.
Abstract
Cluster-weighted models (CWMs) are a class of finite mixture models that jointly model the distribution of covariates and the conditional distribution of a response variable given those covariates. They are effective in uncovering latent subpopulations where both the marginal covariate structure and the regression relationship vary across clusters. However, standard Gaussian CWMs require specifying a parametric regression form within each cluster, limiting their ability to capture nonlinear relationships commonly observed in practice.
To address this limitation, we propose a spline-based extension of the Gaussian CWM that uses B-spline functions to model the within-cluster regression structure. A roughness penalty on the spline coefficients is introduced to control smoothness and prevent overfitting, and this penalization is incorporated into estimation through a fully unsupervised Expectation– Maximization (EM) algorithm. The EM procedure updates spline coefficients via weighted penalized least squares while estimating variance and mixing proportions. In addition, we introduce a mixed-type kernel spline Gaussian CWM that extends the framework to datasets with both continuous and categorical covariates. Although not empirically evaluated here, the model is fully formulated, and an EM-based estimation scheme is provided.
We evaluate the spline-based model using simulated and real datasets that capture a range of cluster-specific relationships with varying degrees of overlap and noise. The results indicate that polynomial Gaussian CWMs perform well when the regression structure is globally simple, such as in the Linear Crossover and the real NPreg dataset, where quadratic polynomials recover the generating form effectively. Spline Gaussian CWMs provide greater flexibility in settings with local nonlinearities, such as the Nonlinear Single and Double Crossover datasets, where they align better with the true group-wise regression functions. Penalized spline CWMs show the most stable performance, mitigating overfitting in challenging cases such as the Parallel and Half-Overlapping datasets, and offering improved classification accuracy relative to unpenalized splines. On fluctuating designs, spline-based approaches capture localized variation better than high-degree polynomials, which risk instability at boundaries.
Overall, this thesis develops nonparametric extensions of Gaussian CWMs by integrating spline regression, penalization, and kernel-based methods. These extensions broaden the applicability of CWMs to complex data structures and provide a foundation for future work on mixed-type and nonparametric mixture modelling.