- This event has passed.
Thesis Defence: On the Understanding of Software Engineering Related Texts via the Transfer of Prior Knowledge
December 2, 2022 at 11:00 am - 2:00 pm
Mohammad Abdul Hadi, supervised by Dr. Fatemeh H. Fard, will defend their thesis titled “On the Understanding of Software Engineering Related Texts via the Transfer of Prior Knowledge” in partial fulfillment of the requirements for the degree of Master of Science in Computer Science.
An abstract for Mohammad’s thesis is included below.
Defences are open to all members of the campus community as well as the general public.
If you would like to attend this virtual defence please contact the supervisor at firstname.lastname@example.org to receive a zoom link.
Software Engineering (SE) related natural language texts, such as app reviews, crowd sourced Questions and Answers (Q&A) play pivotal roles for software engineers and developers to gain knowledge regarding different stages of software life-cycle, such as application development, deployment, and maintenance. Successful classification and clustering can help the developers and engineers quickly understand and process the bulk of information in the most effective way. Therefore, I focus my study on the efficient classification and clustering of SE related texts using state-of-the-art neural language models and adaptive topic modeling techniques, respectively. We have completed an extensive empirical study to understand the strength, effectiveness, and competence of the Pre-trained Transformer based neural language models (PTM) for the app review classifications task and identified the best performing PTMs. I have also pre-trained two of the best performing models from scratch on domain specific data to yield better classification performance. I have scraped Google Play Store and collected the largest domain specific app review dataset for the pre-training purpose. For SE texts clustering purpose, I have proposed a new Online Adaptive Topic Model, Adaptive Online Bi-term Topic Model (AOBTM) that can efficiently identify topics from corpora sliced over different time and version slices. This topic model leverages and adapts the statistical data inferred in the predecessor slices and can add latest slices if necessary. The approach yields good result for the short and noisy SE related natural language texts.