Carnegie Mellon University
PES University
Multimodal Document Question-Answering on Research Papers
Enhanced performance of multimodal document QA on scholarly articles by fine-tuning Qwen-VL-7B for +15% QA accuracy, fine-tuned CLIP to boost image retrieval accuracy by 12%.
Reward Model guided Slide Generation
Trained SmolVLM-500M as a reward model with self-refinement and inference-time scaling leading to a 28% performance gain on AutoPresent.
Exploring Impact of Code in Pre-training
Implemented continuous pretraining of GPT-medium on code data, demonstrating a ~5-7% improvement across popular LLM benchmarks available on Eleuther AI's lm-eval-harness.
Attention Based Evolutionary Approach for Image Classification
Achieved SOTA accuracy on CIFAR-10 with 50% fewer generations than baseline NAS (Neural Architecture Search).
View PublicationSystem And Method for Clustering and Categorizing Large Datasets
Patented system combining text embeddings with dimensionality reduction and clustering techniques; facilitated efficient document retrieval and categorization for deriving business insights.
Automated Workflow for Deepfake Detection
Deployed bi-directional LSTM API with 98% accuracy; slashed parameters by 100x for efficient deepfake detection.
View Publication