×

Low Resource RAG: From Slide Data Processing to RAG Systems

For my Honors Thesis, I develop a retrieval-augmented generation system with Hyundai for automotive safety collision test questions using multimodal slides, finding that fine-tuned embedding models achieve the highest accuracy.

Technologies: Python, LangChain, Hugging Face, Fine-tuning, VLLM, LLM, SLURM

Cross-Domain Generalization of Entropy Probes for Hallucination Detection in Multiple-Choice QA

I investigate cross-domain generalization of entropy probes for hallucination detection in multiple-choice QA, training linear and nonlinear probes on MMLU domains and finding that single-sample probes can approach multi-sample baseline performance while being 10x more computationally efficient.

Technologies: Python, Jupyter Notebook, Hugging Face, Probing, ML, LLM, Scikit-learn

Cyberbullying Classification

A collection of models ranging from classical machine learning to fine-tuned LLMs to detect cyberbullying in text messages. Achieved 99% accuracy utilizing BERT and RoBERTa models for the classification task. Won Best Project Award in CS334: Machine Learning.

Technologies: Python, Hugging Face, PyTorch, Scikit-learn, Git

Thermal Image Data Processing and Analysis Tool

A Python annotator tool to process FLIR thermal images, extracting metadata, thermal analysis, aligning images, and generating binary masks for regions of interest.

Technologies: Python, cv2, Pillow, flyrpy, EXIF, numpy, Github

Student Dropout Prediction

Trained 4 different tabular deep learning networks to predict whether a student is likely to drop out or graduate based on 12 features generated and picked from over 36. Utilized feature engineering, hyper parameter tuning, and deep learning models to achieve 91% accuracy.

Technologies: Python, PyTorch, Jupyter, Scikit Learn, Feature Engineering, Git
© 2024 Andrew Chung / Developed with SvelteKit, Vite, TypeScript, Figma / Inspired by oklama.com