- This event has passed.
Thesis Defence: Intelligent Surveillance with Multimodal Object Detection in Complex Environments
March 22 at 12:00 pm - 4:00 pm
Yue Cao, supervised by Dr. Zheng Liu, will defend their thesis titled “Intelligent Surveillance with Multimodal Object Detection in Complex Environments” in partial fulfillment of the requirements for the degree of Master of Applied Science in Electrical Engineering.
An abstract for Yue Cao’s thesis is included below.
Defences are open to all members of the campus community as well as the general public. Please email zheng.liu@ubc.ca to receive the Zoom link for this defence.
ABSTRACT
Surveillance systems play a crucial role in ensuring public safety. With the advent of deep learning algorithms, these systems have evolved from passive monitoring tools that heavily relied on human operators, to advanced solutions capable of autonomously analyzing scenes with minimal human input. However, accurately detecting objects of interest in real-world scenarios presents a significant challenge due to the dynamic illumination and the varying sizes of objects. This research aims to enhance the accuracy and robustness of intelligent surveillance systems for object detection in complex environments by integrating two complementary sensor data: visible light (RGB) and infrared (IR) images.
First, a multimodal detection framework is proposed building upon the Faster R-CNN architecture, which is capable of integrating features from both RGB and IR images for enhanced object detection. Following this, Poolfuser, a transformer-based fusion module, is introduced and incorporated into the detection framework to fuse features from various modalities from spatial perspective. This approach emphasizes the critical features for target detection. Experimental results show that the multimodal framework equipped with Poolfuser significantly outperforms unimodal detectors and other competing multimodal approaches in terms of detection accuracy in complex environments.
Secondly, to enhance both the efficiency and accuracy of the detection model without increasing computational complexity, a CNN-based lightweight fusion module named CSSA is introduced. This module is designed to fuse the input features from both the spatial and channel perspectives. Furthermore, CNN-based architecture enhances the generalizability of CSSA across datasets of varying sizes. The experimental results demonstrate that the CSSA module can further improve the detection accuracy without affecting the real-time performance of the detection framework.
Finally, considering the impact of other components, such as the backbone network and the loss function on detection performance. This study further optimizes the CSSA-based multimodal detection model and introduces CSSA-Det. CSSA-Det shows improved object detection performance over CSSA and other state-of-the-art multimodal frameworks, particularly in the accuracy of bounding box localization.