Low Resource RAG: From Slide Data Processing to RAG Systems
For my Honors Thesis, I develop a retrieval-augmented generation system with Hyundai for automotive safety collision test questions using multimodal slides, finding that fine-tuned embedding models achieve the highest accuracy.
Cyberbullying Classification
A collection of models ranging from classical machine learning to fine-tuned LLMs to detect cyberbullying in text messages. Achieved 99% accuracy utilizing BERT and RoBERTa models for the classification task. Won Best Project Award in CS334: Machine Learning.
Student Dropout Prediction
Trained 4 different tabular deep learning networks to predict whether a student is likely to drop out or graduate based on 12 features generated and picked from over 36. Utilized feature engineering, hyper parameter tuning, and deep learning models to achieve 91% accuracy.